Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Transcript

UNIVERSITY OF MAIDUGURI Maiduguri, Nigeria CENTRE FOR DISTANCE LEARNING MANAGEMENT SCIENCES ECON 103: INTRODUCTION TO STATISTICS II 3 UNIT: ECON 103: INTRODUCTION TO STATISTICS II units Published 3 2006© All rights reserved. No part of this work may be reproduced in any form, by mimeograph or any other means without prior permission in writing from the University of Maiduguri. This text forms part of the learning package for the academic programme of the Centre for Distance Learning, University of Maiduguri. Further enquiries should be directed to the: Coordinator Centre for Distance Learning University of Maiduguri P. M. B. 1069 Maiduguri, Nigeria. This text is being published by the authority of the Senate, University of Maiduguri, Maiduguri – Nigeria. ISBN: 978-8133-50-9 CDL, University of Maiduguri, Maiduguri ii ECON 103: INTRODUCTION TO STATISTICS II units 3 PREFACE This study unit has been prepared for learners so that they can do most of the study on their own. The structure of the study unit is different from that of conventional textbook. The course writers have made efforts to make the study material rich enough but learners need to do some extra reading for further enrichment of the knowledge required. The learners are expected to make best use of library facilities and where feasible, use the Internet. References are provided to guide the selection of reading materials required. The University expresses its profound gratitude to our course writers and editors for making this possible. Their efforts will no doubt help in improving access to University education. Professor J. D. Amin Vice-Chancellor CDL, University of Maiduguri, Maiduguri iii ECON 103: INTRODUCTION TO STATISTICS II units 3 HOW TO STUDY THE UNIT You are welcome to this study Unit. The unit is arranged to simplify your study. In each topic of the unit, we have introduction, objectives, in-text, summary and self-assessment exercise. The study unit should be 6-8 hours to complete. Tutors will be available at designated contact centers for tutorial. The center expects you to plan your work well. Should you wish to read further you could supplement the study with more information from the list of references and suggested readings available in the study unit. PRACTICE EXERCISES/TESTS 1. Self-Assessment Exercises (SAES) This is provided at the end of each topic. The exercise can help you to assess whether or not you have actually studied and understood the topic. Solutions to the exercises are provided at the end of the study unit for you to assess yourself. 2. Tutor-Marked Assignment (TMA) This is provided at the end of the study Unit. It is a form of examination type questions for you to answer and send to the center. You are expected to work on your own in responding to the assignments. The TMA forms part of your continuous assessment (C.A.) scores, which will be marked and returned to you. In addition, you will also write an end of Semester Examination, which will be added to your TMA scores. Finally, the center wishes you success as you go through the different units of your study. CDL, University of Maiduguri, Maiduguri iv ECON 103: INTRODUCTION TO STATISTICS II Unit: 2 INTRODUCTION TO THE COURSE In this study unit we shall cover four topics. Each topic discusses a particular aspect of the course. However, it be noted that there are relationships among the topics. The topics that we shall cover in this study unit are given below: 1. Statistical Notation; 2. Measures of Central Tendency; Measures of Dispersion; and 4. 3. Measures of Skew-ness and Kurtosis CDL, University of Maiduguri, Maiduguri 1 ECON 103: INTRODUCTION TO STATISTICS II Unit: 2 ECON 103: INTRODUCTION TO STATISTICS II UNIT: 2 T A B L E O F C O N T E N TS PREFACE - - - HOW TO STUDY THE UNIT - - - - - iii - - - - - iv - - - - 1 - INTRODUCTION TO STUDY UNIT 2 TOPIC 1. STATISTICAL NOTATIONS - 3 2. MEASURES OF CENTRAL TENDENCY 7 3. MEASURES OF DISPERSION - - 17 4. MEASURES OF SKEWNESS AND - 23 KURTOSIS - - - - SOLUTION TO EXERCISES CDL, University of Maiduguri, Maiduguri 2 ECON 103: INTRODUCTION TO STATISTICS II Unit: 2 T O P I C 1: TABLE OF CONTENTS 1.0 TOPIC 1: STATISTICAL NOTATIONS - - - 3 1.1 INTRODUCTION - - - - - - - 4 1.2 OBJECTIVES - - - - - - - 4 1.3 IN-TEXT - - - - - - - 4 - - - - 4 1.3.1 SUBSCRIPT OR INDEX 1.3.2 SYMBOLS - - - - - - - 4 1.4 SUMMARY - - - - - - - 5 1.5 SELF-ASSESSMENT EXERCISE (SAE) - - - 6 1.6 REFERENCES - - - - - - 6 1.7 SUGGESTED READINGS - - - - - 6 - CDL, University of Maiduguri, Maiduguri 3 ECON 103: INTRODUCTION TO STATISTICS II Unit: 2 1.0 TOPIC 1: STATISTICAL NOTATIONS 1.1 INTRODUCITON: Subscript and symbols are frequently used in the field of statistics. These notations are aids that assist in condensing large information on a series of data in a reduced form. 1.2 OBJECTIVES At the end of this topic you should be able to: i. Provide a reduced form of a series of data ii. Assign a definite value of the series by summing up or multiplying the scores contained in it. iii. Calculate, compute and evaluate using symbol/notations. IN-TEXT 1.3.1 SUBSCRIPT OR INDEX 1.3 We use subscript or index to distinguish between scores of a series of data and isolate some from others based on our interest. Let Xi interpreted as X subscript I denote any of the k values x 1, x2, … xk assumed by X the variable of interest. In the expression Xi, i stand for any number ranging form 1,2,3, …. K; and this is called subscript or index. E.g: In a series of X1, X2, X3, X4, X5, I = 1,2,3,4,5. At X1, i=1 X2, i=2 X3, i=3 X4, i=4 X5, i=5 1.3.2 SYMBOLS Symbols are generally used to quantify statistical information. Here, after the summation or multiplication of the score, a definite value is assigned to the series under consideration. Let ∑ (sigma) be interpreted as the summation symbol of all scores or observations in the series. This is generally written as follows: K ∑ Xi = X1 + X2 + X3 + … + Xk L=I E.g.: In the case of our series with 5 observations we can apply the formula and rewrite the information as: 5 ∑ Xi = X1 + X2 + X3 + X4 + X5. L=I CDL, University of Maiduguri, Maiduguri 4 ECON 103: INTRODUCTION TO STATISTICS II Unit: 2 Given this, the following also hold K ∑ K ∑ Xi Yi = X1 Y1 + X2 Y2 +…. + XK YK i=1 K ∑ axi = ax1 + ax2 + axk = a (Xi + X2+…..+xk) L=I = a K ∑ xi (a is a constant number) L=i K K ∑ ∑ L=i j=i K Xi Xj=( ∑ X1 )2 , where Xi = Xj L=i K ∑ = xi 2 +2 i=i K K K ∑ ∑ xi xj I< j K K K ∑ (axi + byi – czi) = a ∑ xi + b ∑ -c ∑ i=1 i=1 i=1 i=1 zi if a, b, and c are all constants. K ∑ i=1 xi. xi K = ∑ x2i ╪ L=I K ( ∑ xi)2 L=i The symbol (Pi) is used in the same fashion to denote the product of all scores or observations in the series. This is written as follows: k Xi = xi.x2.x3. …..xk. i=1 E.g: Considering our former example with 5 observations, we have: 5 i=1 1.4 = x1.x2.x3.x4.x5 SUMMARY CDL, University of Maiduguri, Maiduguri 5 ECON 103: INTRODUCTION TO STATISTICS II Unit: 2 The topic has attempted to distinguish between subscripts and symbols. In addition, it has succeeded in combining the two in order to quantify statistical information. 1.5 SELF-ASSESSMENT EXERCISE (SAE) 1. Given the following series of observations, where Xi = 2,3,1,5,4,2, compute: 6 a. 6 xi ∑ , x i2 ∑ i=1 i=1 6 ∑ b. xi i=3 c. 5 i=2 xi 5 d. xi i=3 2. Consider the following value of xj, where xi=xj a. i 1 2 3 4 xi 3 -7 9 -1 K b. xi)2 , ( Compute: ( ∑ i=1 c. xi )2 L=i K K ∑ ∑ i=1 j=i Calculate K (-1) xi xj K d. Evaluate ∑ 6xi i=1 1.6 REFERENCES Walpole, R. E. (1982) Introduction to Statistics, 3rd Edition, New York: Macmillan Publishing Co., Chap. 1 1.7 SUGGESTED READINGS Spiegel, M. R. and Stephens, L. J. (1999), Schaum’s Outline of Theory and Problems of Statistics, New York, London: McGraw Hill, chap. 3 CDL, University of Maiduguri, Maiduguri 6 ECON 103: INTRODUCTION TO STATISTICS II Unit: 2 T O P I C 2: TABLE OF CONTENTS 2.0 TOPIC 1: MEASURES OF CENTRAL TENDENCY - - 7 2.1 INTRODUCTION 2.2 OBJECTIVES 2.3 IN-TEXT - - - - - - 8 - - - - - - - 8 - - - - - - - 8 2.3.1 MEAN - - - - - - 8 2.3.2 MODE - - - - - - 10 - - - - - - 11 2.3.4 GEOMETRIC AND HARMONIC MEAN - - 14 2.4 SUMMARY - - - - 16 2.5 SELF-ASSESSMENT EXERCISE (SAE) - - - 16 2.6 REFERENCES - - - - - - 16 2.7 SUGGESTED READINGS - - - - - 16 2.3.3 MEDIAN - - - - CDL, University of Maiduguri, Maiduguri 7 ECON 103: INTRODUCTION TO STATISTICS II Unit: 2 2.0 TOPIC 2: MEASURES OF CENTRAL TENDENCY 2.1 INTRODUCITON: Measures of central tendency are also known as measures location. These measures describe how centrally placed representative is a particular value among the scores of a series data. The mean, mode and median are the most common measures central tendency. 2.2 of or of of OBJECTIVES At the end of the topic you should be able to: i. Define the mean, mode and median ii. Compute them for both the discrete and continuous variable of a series iii. Use graphs to determine the mode and median of a distribution. 2.3 2.3.1 IN-TEXT THE MEAN OR ARITHMENTIC MEAN The mean is an average value of scores or observations occurring in a set of data. It is calculated by summing up all the values assigned to the scores divided by the total number of observations in the series. We distinguish two cases: The simple arithmetic mean and weighted arithmetic mean. The simple arithmetic mean of a series of data with k observations xi, x2, x3, …, xk is given as: x = xi + x2 + x3 + … + x k N = K ∑ xi (Reduced Form) i=1 K = 1 ∑ xi N i=1 For the sake of conveniences and neatness in the expression, we may ignore the lower and upper limits of the summations sign: ∑ and write: __ x = K 1 ∑ xi , N where Xi: observation with i=1, 2, - - - k CDL, University of N: total numberMaiduguri, of observations in the series Maiduguri 8 ECON 103: INTRODUCTION TO STATISTICS II Unit: 2 CDL, University of Maiduguri, Maiduguri 9 ECON 103: INTRODUCTION TO STATISTICS II Unit: 2 The case of weighted (classified or grouped) arithmetic mean, every xi is associated with a corresponding frequency fi such that xi, x2, x3, - - -, xk will have f1, f2, f3, - - -, fk frequencies. The weighted arithmetic mean is thus given as: __ X = f1x1 + f2 x2 + f3 x3 + - - - + fk xk f1+ f2+f3+ - - -+fk = ∑ fi xi ∑ fi NUMERICAL APPLICATION: Given the following series of data 3, 6, 9,8,4 of a discrete variable x, compute the simple arithmetic mean. __ x = 1 ∑ xi N =3+6+9+8+4 5 = 30 5 =6 Assuming that each observation is associated with a corresponding frequency, 1, 3, 2, 5 and 7. Then the arithmetic means becomes: __ x = ∑fi xi ∑fi = 1(3)+3(6)+2(9)+5(8)+7(4) 1+3+2+5+7 = 3+18+18+40+28 18 = 107 18 = 5.94 PROPERTIES OF THE ARITHMETIC MEAN: 1. The algebraic sum of the deviations of a set of numbers about the mean is zero. ∑ (x – x) = 0 ∑x - ∑x = 0 ∑ x = x+x+x+- - -+x ∑ x – nx = 0 ∑ x – n ∑x = 0 n x=∑x n n times CDL, University of Maiduguri, Maiduguri 10 ECON 103: INTRODUCTION TO STATISTICS II Unit: 2 ∑x - ∑x = 0 2. If two sets of data N1 and N2 have x, and x2 means respectively, then the weighted arithmetic mean of all the means (the means of the two sets combined) is obtained as: f1m1+f2m2 x = N1x1+n2 x2 ; f1+f2 N1+N2 ( ) 3. If there are N observations x1 x2, - - - xn with x mean and y1= kxi (i= 1,2,---n; k=constant), then the arithmetic mean Y can be expressed as: __ Y =kx If the mean of xi, x2 ---xn is xi and that to every value of x a constant c is added such that the observations will look like x1+c, x2+c, --xn+c, then the arithmetic mean of the set will be x+c (where c is a negative or positive number.) If A is an assumed mean (guessed mean) and di = xi –A the deviation of xi from A, then the mean can be calculed as follows: 4. 5. ∑d X=A+ i for simple arithmetic mean N X= A+ mean ∑fidi ∑f for weighted arithmetic _ _ _ In Short, x = A + d ; d =∑di N 2.3.2 THE MODE The mode of a set of n observation is the number that occurs most frequently in that set. In other words, it is the number that has the highest frequency of occurrence in the set. For a discrete variable, the mode is determined by mere observation of the frequency of occurrence of the scores. E.g.: in the following series 3, 2, 2, 2, 4, the mode is Mo = 2 (2 occurs most compared to 3 and 4) For continuous variable or distribution, the mode is located within the modal class (class with the highest frequency). There are CDL, University of Maiduguri, Maiduguri 11 ECON 103: INTRODUCTION TO STATISTICS II Unit: 2 two methods for determining the mode: the mathematical and graphical method. 2.3.2.1 THE MATHEMATICAL METHOD With this method, we use the following formula: f1 . Mo =Lo + f1+f2 Lo: f1 : f2: Z: +z, where lower class boundary (true limit x-0.5) of modal class, difference between the frequency of the modal class and the frequency of the class immediately preceding (before) the modal class, difference between the frequency of the modal class and the frequency of the class immediately proceeding (after) the modal class, size of the modal class (upper true limit –lower true limt). 2.3.2.2 THE GRAPHICAL METHOD The mode of a distribution can also be read from the histogram, by drawing a guide line from the top two extreme corners of the modal class bar and projecting a straight line from their point of intersection on the x-axis to finally locate it (the mode). This is illustrated below: 40 35 30 25 20 15 10 5 0 Mo=xo 2.3.3 xi (class boundary) THE MEDIAN The median is the middle value of a set of data arranged in order of magnitude (ascending or descending order). In other words, it is the measure of location that divides a set of data formally arranged into two equalN parts. In short, it corresponds to the 2t observation of a series of data. h CDL, University of Maiduguri, Maiduguri 12 ECON 103: INTRODUCTION TO STATISTICS II Unit: 2 Eg: Find the median of the following series of data: 1, 5, 2, 7, 3, (Discrete variable) Arranging the data in order of magnitude (ascending or descending order) we obtain: 1,2,3,5,7 ascending order 7,5,3,2,1 descending order In both cases 3 is the value that divides the series into two equal parts. Thus the median Me = 3. If another observation or value say 9 is added to the series the median will lie within the interval 3-5 or 5-3 and the median will be the average of these two numbers. Thus, Me = 3+5 = 8 = 4 2 2 Me = 5+3 = 8 = 4 2 2 For a classified data, we can either use the mathematical approach or graphical approach to find the median. 2.3.3.1 THE MATHEMATICAL APPROACH This approach requires the application of the following formula: N Me = Le + 2 - F x Z, where fe Le: lower class boundary of the median class (class containing the (N)th observation of the series), 2 N: N 2: F: total number of observations in the series, middle value, cumulative frequency of the class preceding (before) the (N)th 2 fe: Z: 2.3.3.2 observation or cumulative frequency before the median class frequency of the median class (relative or absolute but no cumulative frequency), Size of the median class THE GRAPHICAL APPROACH CDL, University of Maiduguri, Maiduguri 13 ECON 103: INTRODUCTION TO STATISTICS II Unit: 2 With this approach, the Me can be read from the cumulative frequency or ogive curve. To do this, we locate the Nth value on the cumulative 2 frequency axis (y-axis), draw a straight horizontal line from that point to intersect with the ogive curve and from the point of intersection project a straight vertical line on the x-axis to finally read the median of the distribution. This is illustrated below: 100 50 0 Me= Lo xi 2.3.3.3 QUARTILES, DECILES AND PERCENTILES These are extrapolations of the concept of median. To compute them we use the formula for computing the median. This is done by considering into how many equal parts the distribution should be divided. In this regard, quartiles, deciles and percentiles divide a set of data into four, ten and hundred equal parts respectively. Quartile are 3 in number: 41, 42 and 43. Deciles are 9 in number: D1, D2, D3, …, D9. Percentiles are 99 in number: P1, P2, P3, …, P99. The general formula for quartiles computation is given below: Qr = Lr + r = 1,2,3: Lr: N: F: rN 4 - F fr x Z, where the group of the quartile of interest lower class boundary of the class containing(rN)th the 4 observation total number of observations, cumulative frequency of the class preceding (before) the (rN)th 4 CDL, University of Maiduguri, Maiduguri 14 ECON 103: INTRODUCTION TO STATISTICS II Unit: 2 class containing the frequency of the fr: observation, observation class containing (rN)th the 4 rN 4: the value dividing the series into 4 x r (the group of interest) size of rN Class Z: 4 The general formulae for the computation of deciles and percentiles follow the same procedure with little and appropriate adjustments where necessary. The formulae are given below: rN Deciles: Dr = Lr + - 10 F x Z, D1 D2 D3 … D9 fr Percentiles: rN Pr = L100 r + P99 F 2.3.4 THE GEOMETRIC AND HARMONIC MEAN - x Z, P1 P2 P3 … fr The geometric and harmonic mean are not frequently used in statistical analysis. However, we need to discuss them here for the purpose of completeness. The geometric mean on N positive values x1, x2 … xn of a set is the Nth root of the product of all numbers of the set. It is mathematically defined as: G N x1.x 2...xn Simple geometric mean G N f 1 x1. f 2 x 2... fnxn Weighted geometric mean Eg: Find the geometric mean of the following numbers: 3, 4, 5 of a series. G CDL, University of Maiduguri, Maiduguri 15 ECON 103: INTRODUCTION TO STATISTICS II Unit: 2 3 3.4.5 3 60 =3 shift or 2nd F or inv x y60 3.91 Similarly, we can use the decimal logarithm (log base 10) to find the geometric mean of a series of observations. The formula is given as: log G 1 N log xi, ( simple ) log G 1 N fi log xi, ( weighted ) log G 1 N log xi = 1 N (log xi log x 2 log x3 ... log xn) Eg: using our previous data: 3,4,5, we get: log G = 13 (log 3+log 4+log 5) = 13 (0.477+0.60+0.698) = 0.33 (1.775) log G = 0.58575 G = shift = 3.90 or 2nd F or inv 10x 0.5.8575 The harmonic mean of n numbers x1, x2, …, x7 of a series of data is the reciprocal of the arithmetic mean of their reciprocals. The harmonic mean is mathematically expressed as: 1 H 1 H = ∑ =∑ 1 N 1 xi H∑ xi1 =N 1 xi N N N H H fi xi , ( weighted ) 1 xi , ( simple ) Eg: Find the harmonic mean of the following numbers: 2,7,5,3 CDL, University of Maiduguri, Maiduguri 16 ECON 103: INTRODUCTION TO STATISTICS II Unit: 2 H N 1 xi 4 ( 15 13 ) 1 2 1 7 4 1.17 0.85 The Relationship between the arithmetic mean, geometric mean and harmonic mean of a series of data. If the observations x1, x2, x3 …xn of the series are different then H G X . However, if the observations are identical then H G X. The empirical relationship between the mean, mode and median. For unimodal (one mode) frequency curves that are moderately skewed (asymmetrical) the following relationship holds: X Mo 3( X Me) . CDL, University of Maiduguri, Maiduguri 17 ECON 103: INTRODUCTION TO STATISTICS II Unit: 2 2.4 SUMMARY The topic has introduced the student to the meaning of measures of central tendency. The topic has also exposed the students to various approaches of getting these measures. Besides this, the topic has succeeded in defining the existing relationship between the measures discussed. 2.5 SELF-ASSESSMENT EXERCISE 1. 2. 3. 4. 5. 2.6 Define measures of central tendency Compute the mean, mode and median of the following numbers:7,9,10,5,6,15,4,7,8,7,13,3 Find the geometric mean of the series of data in 2 above using the two approaches discussed in the main text. Prove that the sum of the deviations about the mean of the series of data in 2 above is zero. A hypothetical pattern of people’s weekly expenses in a ward in Maiduguri is given below: Daily Expenses in N000 No. of people 1–4 10 5-8 15 9 – 12 50 13 – 16 15 17 – 20 10 Total 100 Use the data to compute the mean, mode and median of the distribution. Graphically find the position of the mode and median of the distribution. REFERENCES Spiegel, M. R. and Stephens, L. J. (1999), Schaum’s Outline of Theory and Problems of Statistics, New York, London: McGraw Hill, chap. 3 2.7 SUGGESTED READINGS Walpole, R. E. (1982) Introduction to Statistics, 3rd Edition, New York: Macmillan Publishing Co. CDL, University of Maiduguri, Maiduguri 18 ECON 103: INTRODUCTION TO STATISTICS II Unit: 2 T O P I C 3: TABLE OF CONTENTS 3.0 TOPIC 1: MEASURE OF DISPERSION - - - 17 3.1 INTRODUCTION - - - - - - - 18 3.2 OBJECTIVES - - - - - - - 18 3.3 IN-TEXT - - - - - - - 18 - - - - 18 - 3.3.1 VARIATION RATIO 3.3.2 RANGE - - - - - - 18 3.3.3 VARIANCE - - - - - - - 19 3.3.4 STANDARD DEVIATION - - - - 20 3.4 SUMMARY - - - - 21 3.5 SELF-ASSESSMENT EXERCISE (SAE) - - 21 3.6 REFERENCES - - - - - - 22 3.7 SUGGESTED READINGS - - - - - 22 - - - - CDL, University of Maiduguri, Maiduguri 19 ECON 103: INTRODUCTION TO STATISTICS II Unit: 2 3.0 TOPIC 3: MEASURES OF DISPERSION 3.1 INTRODUCITON: Measures of dispersion are measures describing the extent of spread, scatter or dispersion among scores in a distribution. They are generally used to indicate how typical a measure of central tendency is with small variability among the scores. In a more narrow distribution for example, the measure of central tendency will be closer to all scores of the distribution. However, we will restrict ourselves to the study of the variation ratio, range, variance, standard deviation and coefficient of variation. 3.2 OBJECTIVES At the end of this topic, you should be able to: i. Calculate and assign definite value of spread of a distribution. ii. Identify the appropriate measure of variability to be used for assessing the spread among scores of a distribution when a particular measure of central tendency is considered. iii. Make comparison between the variability of distributions and iv. Overcome the difficulties that may arise from the input data requirements of distributions (differences in variables or difference in arithmetic means). 3.3 IN-TEXT 3.3.1 VARIATION RATIO The variation ratio measures the proportion or percentage of subject/scores of a distribution outside (not included in) the modal class. It is appropriate to use this measure as a measure of variability when the typical measure of central tendency under consideration is the mode. The variation ratio is given in percentage as: fo VR 1 %, where N fo: frequency of the modal class, N: total number of observations in the distribution. 3.3.2 RANGE CDL, University of Maiduguri, Maiduguri 20 ECON 103: INTRODUCTION TO STATISTICS II Unit: 2 The range of distribution is the difference between the smallest and largest score of the distribution. We use it as a measure of variability when the typical measure of central tendency under consideration is the median. The range is mathematically defined as: L: : R=L– , where largest upper true limit of the distribution, smallest lower true limit of the distribution. 3.3.3 VARIANCE The variance of a distribution is the average of the squared deviation of the scores about their mean. It is the most commonly used indicator of degree of variability and most dependable estimate of variability in the total population from which samples are generally drawn. We use variation to measure the extent of spread in a distribution when the typical measure of central tendency under consideration is the arithmetic mean. We distinguish two types of variance – namely: population variance ( 2) and sample variance (S2). NB: no unit of measurement. ( x ) 2 2 , population variance N : N: population mean biased estimate of the total number of observations, in the population/set. ( x x ) 2 x 2 x 2 ( )2, sample variance N n n X: n=N-1: sample mean unbiased estimate of the total observations in the sample/sub-set. number of In the case of classical data/grouped data/ categorized data, we express the variance as follows: 2 s2 f ( x )2 , population variance N f ( x x) 2 , N 1 CDL, University of Maiduguri, Maiduguri 21 ECON 103: INTRODUCTION TO STATISTICS II Unit: 2 fx2 ( fx )2, sample variance THE STANDARD DEVIATION 3.3.4 The standard deviation of a distribution is the square root of the variance of the distribution. Similarly, we have population standard deviation and sample standard deviation. These are computed as follows: 2, population standard deviation s s2 , Sample standard deviation. Contrary to the variance, the standard deviation carries the unit of measurement assigned to the variable under study. 3.3.4.1 1. 2. THE USES AND INTERPRETATION OF THE STANDARD DEVIATION The standard deviation is used to calculate many other statistics (coefficient of variation, moments…). We also use the standard deviation to compare the variability of scores between two groups/distributions. A with S.D= 3.0 easy to get the subject B with S.D= 15 very difficult to get the subject because the widespread. of The value of the standard deviation of a distribution indicates the magnitude of the spread among the scores of the distribution. If the standard deviation of a distribution is small, it means that the scores are concentrated near the mean. On the other hand if it is large, the scores are scattered widely about the mean of the distribution. 3.3.5 THE COEFFICIENT OF VARIATION CDL, University of Maiduguri, Maiduguri 22 ECON 103: INTRODUCTION TO STATISTICS II Unit: 2 When a comparison between the variability of distributions is not satisfactory or conclusive after use of the variance and standard deviation due to difference in their variables or arithmetic means (with same variables), we use the coefficient of variation. The coefficient of variation is the ratio of the standard deviation to the arithmetic mean of a distribution. It is expressed in percentage as: S .D CV x100 . X Other measures are defined as follows: Inter Quartile Range: IQR= Q3 – Q1 Semi-Inter Quartile Range: the arithmetic mean of the deviations of the first and third quartiles round the median ( Me Q1) (Q3 Me) 2 Q3 Q1 2 SIQR Mean Absolute Deviation: 3.4 M.A.D = xx N SUMMARY The topic has attempted to define measures of dispersion and exposed the student to different formulae for computing the various measures of spread discussed. Besides, the uses and interpretation of the standard deviation are given. 3.5 SELF-ASSESSMENT EXERCISE (SAE) 1. 2. 3. What is a measure of dispersion? Give mathematical formulae for measures of spread known to you. With the following information: Age Group No. of Students 16-18 5 19-21 10 22-24 13 25-27 35 CDL, University of Maiduguri, Maiduguri 23 ECON 103: INTRODUCTION TO STATISTICS II Unit: 2 4. 3.6 28-30 7 Total 70 Compute: i. Variation ratio ii. Range iii. Variance iv. Standard deviation v. Coefficient of variation Compare the distribution with that of Exercise 5 of section 2.5 in Topic 2 and freely comment. REFERENCES Spiegel, M. R. and Stephens, L. J. (1999), Schaum’s Outline of Theory and Problems of Statistics, New York, London: McGraw Hill, chap. 3 3.7 SUGGESTED READINGS Walpole, R. E. (1982) Introduction to Statistics, 3rd Edition, New York: Macmillan Publishing Co. CDL, University of Maiduguri, Maiduguri 24 ECON 103: INTRODUCTION TO STATISTICS II Unit: 2 T O P I C 4: TABLE OF CONTENTS 4.0 TOPIC 1: MEASURES OF SKEWNESS AND KURTOSIS 4.1 INTRODUCTION - - - - - - - 24 4.2 OBJECTIVES - - - - - - - 24 4.3 IN-TEXT - - - - - - - 24 - - - 24 - - - 24 - 4.3.1 PEARSON COEFFICIENT 4.3.2 BOWLEY COEFFICIENT 23 4.3.3 MOMENTS - - - - - - 25 4.3.4 KURTOSIS - - - - - - 26 - - 4.4 SUMMARY - - - - - 27 4.5 SELF-ASSESSMENT EXERCISE (SAE) - - - 27 4.6 REFERENCES - - - - - - 28 4.7 SUGGESTED READINGS - - - - - 28 - CDL, University of Maiduguri, Maiduguri 25 ECON 103: INTRODUCTION TO STATISTICS II Unit: 2 4.0 TOPIC 4: MEASURES OF SKEWNESS AND KURTOSIS 4.1 INTRODUCITON: In real world situations, it is possible to come across two different series of data with exactly same arithmetic mean and standard deviation. When this occurs, we shall ascertain the extent to which the scores are distributed across the central measure and how their curves depart from that of a normal or Gaussian distribution before drawing any conclusion about the series. This requires the study of the measures of skewness and kurtosis. 4.2 OBJECTIVES At the end of this topic, you should be able to: i. Define skewness ii. Define kurtosis iii. Compute their measures iv. Conclude about the type of skewness and kurtosis assumed by a given distribution. 4.3 IN-TEXT Skewness describes the degree of asymmetry in polygons from two series. The skewness of a distribution always lies between – 1 and 1. 4.3.1 PEARSON COEFFICIENT (FIRST AND SECOND) The Pearson coefficient is used to measure the skewness when the emphasis is placed on the mean, mode, median and S.D of the distribution. The coefficient is given as V1 or V2. ( x Mo ) V1 ; S .D 3( x Me ) V2 S .D 4.3.2 BOWLEY COEFFICIENT When the emphasis shifts to the median and quartiles, we apply the Bowley coefficient to ascertain the skewness of the distributions. The Bowley coefficient is given as: 2(Q3 Q1 Q 2 Me) K Q3 Q1 CDL, University of Maiduguri, Maiduguri 26 ECON 103: INTRODUCTION TO STATISTICS II Unit: 2 NB: Me = Q2 A polygon distribution can be symmetrical, positively skewed or negatively skewed. The skewness coefficient of a symmetrical distribution is always zero and x , Mo and Me all coincide. The skewness coefficient of a positively skewed (skewed to the right) distribution is always greater than zero. Here, Mo > Me > x . 100 80 60 40 20 0 x Me Mo Figure 1: Negatively skewed distribution Skewness < 0 100 80 60 40 20 0 x, Me, Mo Figure 2: Symmetrical Distribution Skewness = 0 100 80 60 40 20 0 x Me Mo Figure 3: Positively skewed distribution Skewness > 0 4.3.3 MOMENTS Moments are the widely accepted measures of skewness by mathematical statisticians. The general formula for moments is given as: 1 Mr ( x x)r , where r: rank of the moment. n Given this general formula, it follows: CDL, University of Maiduguri, Maiduguri 27 ECON 103: INTRODUCTION TO STATISTICS II Unit: 2 1 ( x x ) 0 - first moment M1 n 1 M2 ( x x) 2 variance – second moment n 1 M3 ( x x)3 - third moment ns 3 4.3.4 KURTOSIS Kurtosis describes the amount of peakedness in a distribution. It explains how a particular distribution departs from the shape of the normal distribution curve. Kurtosis shows whether a distribution is very pointed with wide tails or humped with short tails. A distribution that is very pointed with wide tails is known as leptokurtic. A broad humped distribution with short tails is referred to as platykurtic. A distribution that is neither leptokurtic nor platykurtic is known as mesokurtic. A mesokurtic distribution conforms to the shape of the normal distribution curve. The amount of peakedness or kurtosis is measured by the fourth moment. The coefficient is given as: ( x x) 4 4 , s 4 The coefficient of kurtosis for mesokurtic distributions is always 4=3. The coefficient of kurtosis for leptokurtic distributions is always greater than 3, while the amount of peakedness for platykurtic distributions is always less than 3. CDL, University of Maiduguri, Maiduguri 28 ECON 103: INTRODUCTION TO STATISTICS II Unit: 2 150 135 120 105 90 75 60 45 30 15 0 Platykurtic 0 2 4 6 8 150 135 120 105 90 75 60 45 30 15 0 Mesokurtic (norm al) 0 5 10 150 135 120 105 90 75 60 45 30 15 0 Leptokurtic 0 4.4 5 10 SUMMARY This topic has explained the degree of asymmetry and amount of peakedness in distributions. The topic has also exposed the student to mathematical formulae for measuring these statistics and displayed their respective graphical presentation. 4.5 SELF-ASSESSMENT EXERCISE (SAE) 1. Define skewness and kurtosis 2. Based on employment records obtained from a ministry in your state over a period of time, you have been able to compile the following information: CDL, University of Maiduguri, Maiduguri 29 ECON 103: INTRODUCTION TO STATISTICS II Unit: 2 43,34,45,35,55,44,45,65,48, 53,70,60,45,48,44,69,73,55,51,58,66,77,40, 43,35,54,77,54,77,49,56,30,48,55,67,36,47, 52,53,46,74,45,64,45,71,32,44,55,43,35,60, 55,66,53,44,36,48,46,76,73,34,49,47,57,50, 54,65,53,46,44,56. a. Construct a frequency distribution table of 6 classes b. Calculate the mean, mode and median of the distribution. c. Construct a histogram and frequency polygon of the distribution d. Construct a cumulative frequency of the distribution e. Compute the Peason coefficient and the fourth moment of the distribution. f. Graphically show the degree of the asymmetry and amount of peakedness in the distribution. 4.6 REFERENCES Spiegel, M. R. and Stephens, L. J. (1999), Schaum’s Outline of Theory and Problems of Statistics, New York, London: McGraw Hill, chap. 5 4.7 SUGGESTED READINGS Karmel, P. H. and Polasek, M. (1970): Applied Statistics for Economics, 3rd edition, Great Britain: Pitman. CDL, University of Maiduguri, Maiduguri 30 ECON 103: INTRODUCTION TO STATISTICS II Unit: 2 CDL, University of Maiduguri, Maiduguri 31 ECON 103: INTRODUCTION TO STATISTICS II Unit: 2 SOLUTIONS TO EXERCISES TOPIC 1: 1. xi=2,3,1,5,4,2 6 a. x 2i 2 3 1 5 4 2 i 1 17 6 xi 2 (2)2 (3)2 (1)2 (5)2 (4)2 (2)2 i 1 4 9 1 25 16 4 59 6 b. xi 1 5 4 2 i 3 12 5 c. i 2 3x1x5 x4 x2 120 TOPIC 2: 5. Computation of the mean, mode and median of the distribution. Daily Expenses (N000) No. of people fi 1–4 5-8 9–12 13–16 17-20 TOTAL 10 15 50 15 10 100 Midpoints Product Cumulative frequency xi fi xi F 2.5 6.5 10.5 14.5 18.5 - 25 97.5 52.5 217.5 18.5 1050 True Limit -0.5 L+0.5 0.5 – 4.5 4.5 – 8.5 8.5 – 12.5 12.5-16.5 16.5-20.5 10 25 75 90 100 1050 1 10.5 xN1000 N fixi The mean of the distribution is: x= 100 N10,500 CDL, University of Maiduguri, Maiduguri 32 ECON 103: INTRODUCTION TO STATISTICS II Unit: 2 The mode of the distribution should be located within the interval 9-12 which is the modal class. The mode is therefore: ( f 1) Mo Lo xZ f1 f 2 35 35 4 Mo 8.5 ( )4 8.5 ( )4 8.5 ( _ 10.5 35 35 70 2 Mo 10.5 xN1000 N10,500 The median of the distribution should be located within the median class of 9-12, where the N2 th observation of the distribution falls. The median: (n f ) Me le 2 Z fe N Le 8.5, 50, F 25, fe 50, Z 4 2 50 25 25 4 Me 8.5 ( )4 8.5 ( )4 8.5 ( ) 10.5 50 50 2 Me 10.5 XN1000 N10,500 Graphical location of the mode and median of the distribution. To locate the mode with the help of a graph, we shall construct a histogram of the distribution, whereas for the median, we shall construct the cumulative frequency or ogive curve of the distribution. 140 12.5 120 100 80 60 8.5 4.5 0.5 40 20 0 frequency CDL, University of Maiduguri, Maiduguri 0 Class boundary (N000) 33 ECON 103: INTRODUCTION TO STATISTICS II Unit: 2 Location of the median: Construction of the ogive curve: 100 90 80 70 60 50 40 30 20 10 0 Ogive Curve Upper true limit 4.5 8.5 12.5 16.5 TOPIC 3: 1. In order to compare this distribution with the distribution of Exercise 5, section 2.5 of topic 2 we proceed as follows: We compute the standard deviation of the two distribution. Standard Deviation for exercise 3 of topic 3. 1 2 f ( x x) 2 N 1697 N 70, x 24.24 70 728.34 f ( x x)2 728.34; 2 10.40 70 Standard Deviation 2 10.40 3.22 Standard Deviation for Exercise 5 of topic 2 1 2 f ( x x)2 N 1050 N 100, x 10.5 100 78625 f ( x x)2 78625, 2 786.25 100 Standard Deviation 2 2 100 786.25 328.04 The data are given in ‘000’. Thus, =28.04x1000=28040 CDL, University of Maiduguri, Maiduguri 34 ECON 103: INTRODUCTION TO STATISTICS II Unit: 2 The concentration of scores near or around the mean is smaller for the distribution in Exercise 3 of topic 3. this means that it is a narrow spread distribution and we can easily get in touch with the subjects. In Exercise 5 of topic 2, the distribution assumes a widespread distribution pattern. The subjects are spread faraway from the mean. This means, it is very difficult to get the subject of the distribution. In short, we may say that =28.04 > =3.22. TOPIC 4: 2. e Computation of the Pearson coefficient and the fourth moment of the distribution. We shall consider both the first and second Pearson coefficient (V1 and V2) x Me (Pearson first coefficient) V1 S .D x 52.20 Mo 49.86 S .D 12.15 V 1 0.192 V2 ( x Me) S .D (Pearson second x 52.20 Me 51.18 S .D 12.15 V 2 0.251 coefficient) The Fourth Moment f ( x x) 4 4 NS 4 f ( x x) 4 3354549.68 N 71 S 4 21792.40 4 2.16 CDL, University of Maiduguri, Maiduguri 35 ECON 103: INTRODUCTION TO STATISTICS II Unit: 2 f) Illustration for the skewness and kurtosis of the distribution. For the skewness we focus attention on the position of the mean, mode and median of the distribution. x 52.20 Mo 49.86 Me 51.18 Here Mo < Me < Mean we have a positively skewed distribution or a skewed to the right distribution. x 25 20 15 x 10 5 0 0 49.86 51.18 52.20 x For the amount of peakedness, we concentrate on the value of 4. Since 4=2.16< 3, then we conclude that the distribution is a platykurtic distribution. This can be illustrated as follows: 20 15 10 5 0 variable Frequency Frequency 30 50 CDL, University of Maiduguri, Maiduguri 36 ECON 103: INTRODUCTION TO STATISTICS II Unit: 2 TUTOR-MARKED ASSIGNMENT UNIT-TEST 1 1. 2. 3. The responses of a nutrient of a hypothetical drug producing or manufacturing company on the weight of rabbits have been recorded over a given period of time as below/follows: 27,26,25,23,27,28,32,31,30,29,30,30,28, 43,44,45,47,33,35,37,35,36,34,34,33,37, 36,36,34,33,34,37,33,42,40,41,38,39,40, 42,41,40,39,38,40,42,41,39,39,41,40,41, 40,39,38. a. Consider the data and construct a five class frequency distribution. b. Calculate the mean, mode and median of the distribution. c. Locate the position of the mode and median of the distribution using separate graphs or techniques of data presentation. The number eggs produced in a poultry by ten layers in an hour is given below: 10,11,23,9,7,10,16,20,10,14 a. What is the nature of the variable under consideration? b. Compute the arithmetic mean for the first five layers. c. Calculate the arithmetic mean of the series of data d. What is the mode? e. Find the median of the series of data. f. Calculate the standard deviation of the series and comment. Explain briefly the following terms: a. Measures of dispersion b. Skewness and kurtosis c. Quantiles CDL, University of Maiduguri, Maiduguri 37 ECON 103: INTRODUCTION TO STATISTICS II Unit: 2 d. Measures of central tendency. CDL, University of Maiduguri, Maiduguri 38