Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1 RS: 2 Descriptive Statistics (Interpretation) Frequency The number of times a particular quantity or characteristic or a particular measurement occurs is called frequency. In other words, frequency is the count of the tally for each characteristic/ measurement. To draw up a frequency distribution table, we first identify the characteristic/measurement that repeats and then tabulate the number of times the repetition occurs. Example: A farmer has forty sheep. The maximum weights of sheep are 50 kg. The weights (in Kg) of the sheep in arbitrary order are 33, 36, 34, 37, 35, 38, 40, 39, 42, 41, 44, 43, 46,45, 47,49, 50,48, 48, 36, 39, 35, 33, 48, 44, 46, 37, 41, 50, 49, 35,50, 47, 38,46, 44, 42, 48, 39, 48. Class (weights in kg) 33- 38 39- 44 45 -50 Tally Frequency Range: The difference between maximum and minimum value of data is the range. From the above, the range is 50-33 = 17. Size or Width of the Class: From above, each class interval covers 6 weights. This 6 is called the size or the width of the class. Upper Boundary and Lower Boundary: The upper value of a class and the lower value of the class are said to be upper and lower boundary. Example, from the above, 33 is the lower boundary and 38 is the upper boundary of the first class. Class Marks: The average of the class boundaries is called Class Mark. Example, from the above frequency table, class mark of first class is (33 + 38)/2 = 35.5 2 MEASURES OF CENTRAL TENDENCY Several values may be used to describe the central tendency of a sample or a population which is most appropriate depends on the scale of measurement used and on the information we wish to convey. 1.MEAN The mean is the sum divided by n, the number of values in the set or in a given data. Let x1, x2, x3 ... xn be a set of n values. Then Mean X = ∑x/n. Data could be un-grouped or grouped. A. Un-grouped Data I Calculation of Mean of ungrouped data. Example : Calculate the mean of the following data: 8, 12, 13, 25, 39, 15, 2. From this data, n =7 so Mean X =(8 + 12 + 13 + 25 + 39 + 15 +2)/ 7 Mean X = ∑x/n = 114/7 = 16.29 B. Grouped Data II Calculation of Mean of grouped data. The formula for calculating the mean of grouped data is Mean X = ∑fx / ∑f, Where f is the frequency and ∑f is the total frequency. Example: Find the mean of the following data: Daily expenditure in Nu(x) No of Children (f) 2 25 3 15 4 10 5 5 6 5 3 Solution Daily expenditure in Nu (x) No of children 2 25 3 15 4 10 5 5 6 5 ∑ f = 60 ∑fx = 190 (f) fx 50 45 40 25 30 Mean X = ∑fx / ∑f = 190/60 = 3.16 III Calculation of mean when Class Interval are given Class Frequency 0 -10 3 10-20 5 20-30 4 30-40 2 Solution Class 0-10 10 -20 20-30 30-40 40-50 Class-mark (x) 5 15 25 35 45 ∑f =18; ∑fx= 440 Mean X= ∑fx / ∑f = 440/18 = 24.44 f 3 5 4 2 4 ∑f =18 fx 15 75 100 70 180 ∑fx= 440 40-50 4 4 2. MEDIAN Median is the central value of the variables arranged in order. If n observed values are arranged in order, the median is the middle value. A. Ungrouped Data (odd) If N is odd, then the value in the (N+1)/2 position is the middle value and is the median. N is the number of observation. I. Find the median of the following number 1. 1, 2, 3.7,8,11,12,14,19,20, Median = (N+1)/2 = 6th position. So the number is 8 B. Un-grouped Data (even) If n is even, median is the mean of N/2 and N + 2/2th position where N is the number of observation. II. Find the median of the following number: 4, 5, 5 6,7,10, 15, 21, 22,23,24,25 Solution: N/2 is 6th position and N+2/2 is 7th position. Therefore the two numbers in the middle are 15 and 10. The median is (10 +15)/2 = 12.50 C. Grouped Data with frequency III. To find the median when frequencies are given In order to calculate the median, the cumulative frequency column is made. Example: Find the median of the frequency distribution: Variable (x) 1 2 3 4 5 6 Frequency (f) 2 4 7 9 3 1 5 Solution X f cf (Cumulative frequency) 1 2 2 2 4 6 3 7 13 4 9 22 5 3 25 6 1 26 Total frequency = 26 frequency. Median = value of (N+1)/2 th item where N is the total So, the 13.5th item is 4. Therefore, Median = 4. D. Grouped Data with Class Interval IV Calculation of MEDIAN when Class Interval is given. This formula is used to calculate the median when C.I is given Median = l + ((N/2-C)/f) x i. Where: l is the lower boundary(limit) of the median class N is the total number of the item C is the cumulative frequency of the preceding to the median class f is the frequency of the median class i is the size of the class, 6 Example. Find the median of the following frequency distribution: Class 10-15 15-20 20-25 Frequency 3 7 16 25-30 30-35 12 9 35-40 6 Solution CLASS Frequency (f) 3 7 16 12 9 6 10 - 15 15 - 20 20 - 25 25 - 30 30 - 35 35 - 40 Com.Frequency (C) 3 10 26 38 47 53 N/2 = 53/2 = 26.5 lies in the class (25-30) l = 25 ; f = 12 ; C =26 ; i = 5 Median = 25 + (26.5 -26 ) /12 x 5 = 25.21 3.MODE Mode is the Value or Size of the item in a distribution which occurs most often. A. Un-grouped Data Type I To find the mode of the set of numbers: Example: 2,3,5,7,8,3,5,8,5 Here,5 occurs most of the times. Therefore, the mode is 5. B. Grouped Data with frequency Type II .To find the mode of the following distribution: Example: Size of Item 2 3 4 5 6 7 7 Frequency 4 6 12 15 9 4 Solution From the above data, the size 5 has the maximum frequency (15). Hence, the mode is 5. C. Grouped Data with Class Interval Type III To calculate the mode when the data are given in terms of Class Interval. The mode can be obtained by using this formula: Mode = l + (fm - f1 /(fm – f1) + (fm – f2) x i Where, fm = Frequency of the modal class. f1 = Frequency of the class preceding the modal class f2 = Frequency of the succeeding class. i = Size of the modal class. l = Lower limit of the modal class. Example : Class 0-5 No of 7 students 5-10 10-15 15-20 20-25 25-30 30-35 35-40 10 16 32 24 18 10 5 8 Solution From the above data, Modal class is (15-20) l = 15 , i = 5 , f = 32 , f = 16 , f = 24 So, MODE = 15 + (32 - 16)/(32 – 16) + (32 -24) x 5 = 18.30 Exercise Question: The following data shows the weight of pigs in Kg: 50,45,49,54,60,62,67,63,56,50,72,59,54,65,49,48,63,70,64,67,45,69,65,71,60,57, 46,53,66,71. I. Choosing suitable class-interval, make a frequency table & histograph of the above data. II. Compute mean, median and mode from the above data. MEASURES OF DISPERSION I. VARIANCE Variance is the sum of squared deviations (d) of the observation from the mean (x) divided by the degree of freedom (number (N) of observations minus one). If N is the number of 2 observations, then the degree of freedom is N-1. It is referred to as σ (sigma squared) for population and s2 (s squared) for the sample. A. Un-grouped Data If X is the variable and X1, X2, X3,….. Xn are variables values, then the variance is: s2 = {(x1 – x)2 + ((x2 – x)2 + (x3 – x)2 +…. (xn – x)2 / (n – 1) s2 = ∑(x-x)2 / (n-1) where 9 ∑(x-x)2 = sum of squares N – 1 = degree of freedom N = number of observation CALCULATION OF s2 and SD WHEN INDIVIDUAL DATA IS GIVEN Example: Find the s2 for the following weights (Kg) of seven sacks of rice: 4, 10, 25, 30, 38, 50, 60 Solution Mean = 4+10+25+30+38+50+60 / 7 Mean (X) = 31 Weights (KG) X 4 10 25 30 38 50 60 x-x = d -27 -21 -6 -1 +7 +19 +29 (x - x )2 = d2 729 441 36 1 49 361 841 ∑(x-x)2 = 2458 s2 = ∑(x-x)2 / (n-1) s2 = 2458 / 6 = 410 II. STANDARD DEVIATION The standard deviation (SD) is the square root of the arithmetic averages of the deviation from the mean. The SD shows the scatter of the individual values about the arithmetic mean. If it is small, it means that the individual values cluster round the mean and large departures occur only very rarely. If it is large, the individual values are spread widely from the mean ,i.e. there is a great variation in data. SD is always positive and it takes values from zero to infinity. SD = √∑(x-x)2 / (n-1) SD = √2458/6 = 20 10 Exercise a. The height of girls in inches are given as follows: 64, 66, 70, 74, 75, 76 Find the s2 and SD of girls’ height B. Grouped Data If data are in frequency distribution, we must account for each possible value of x and also the number of times or frequency that the value occurs. We therefore establish fx column to derive mean. We can then calculate the s2 for such grouped data by squared deviation first and then multiply each squared deviation by the frequency of that particular value of x. The squared deviation is then added. The formula used for s2 is as follows: s2 = ∑[(x-x)2 f / ∑f CALCULATION OF s2 and SD WHEN DATA WITH FREQUENCY IS GIVEN Example X f fx X x –x (x – x)2 (x – x)2 f 9 2 18 7.5 1.5 2.25 2.25 x 2 = 4.50 8 3 24 7.5 0.5 0.25 0.25 x 3 = 0.75 7 3 21 7.5 -0.5 0.25 0.25 x 3 = 0.75 6 2 12 7.5 - 1.5 2.25 2.25 x 2 = 4.50 ∑f =10; ∑fx = 75 X = ∑fx / ∑f = 75/10 = 7.5 s2 = ∑[(x-x)2 f / ∑f = 10.5 / 10 = 1.05 SD = √∑fx / ∑f = √10.5/ 10 = √1.05 = 1.02 ∑ (x – x)2 f = 10.50 11 Exercise Calculate the variance (s2) and standard deviation (SD) from the grain yield (t/ha) data. Grain yield (X) 5 6 7 9 10 12 13 Farms 4 5 5 7 3 2 2 CALCULATION OF s2 and SD WHEN CLASS-INTERVAL IS GIVEN s2 = i2 / N [∑ fd2 – (∑fd)2 / N] Example Class Frequency 0-10 3 10- 20 5 20 - 30 8 30 - 40 3 40 -50 1 Solution Class 0 - 10 10 -20 20 - 30 30 - 40 40 - 50 Total CM (x1) 5 15 25 35 45 freq.(f) fx1 3 5 8 3 1 ∑f = 20 15 75 200 105 45 ∑fx 440 (x1 – x) / i=d -17 -7 3 13 23 = Mean = ∑fx / ∑f = 440 /20 = 22 s2 = i2 / N [∑ fd2 – (∑fd)2 / N] s2 = 102 / 20 [ 2220 – (0)2 /20 ] = 100 / 20 [2220] = 11100 SD = √i2 / N [∑ fd2 – (∑fd)2 / N] SD = √11100 = 105 d2 fd fd2 289 49 9 169 529 -51 -35 24 39 23 ∑fd = 0 867 245 72 507 529 ∑ fd2 = 2220 12 Exercise Compute s2 and SD from following data Class Frequency 7-9 5 9-11 10 11-13 13 13-15 22 15-17 26 III. COEFFICIENT OF VARIATION Coefficient of variation is another measure of variability. It is a dimensionless index of variability. When we want to compare two distributions, the SD may differ in their units, e.g. when we want to compare the distribution of heights, expressed in cm with that of weight, expressed in Kg. Since the unit differs, we cannot compare these two distributions based on their standard deviations. For comparative purposes, a relative measure of variation is therefore required. Such a relative measure is called coefficient of variation (CV) which is obtained by expressing the SD as percentage of the mean as follows: CV = (SD / X ) x 100 A. Ungrouped data From the Ungrouped data example, the Mean (x) was = 31 and SD = 20. Therefore its CV is = (20/31) x 100 = 64.5 % B. Grouped data with frequency From the earlier example, Mean (x) was 7.5 and SD was 1.02. Therefore CV = (SD / x) x 100 = (1.02/7.5) x 100 = 13.6 % C. Grouped data with class interval The mean of data with class interval from earlier example has 22 and SD was 105. Therefore CV = (SD/ X) x 100 = (105/22) x 100 = 477 % Exercise a. In an experiment, two variables, plant height and grain weight/plant, along with their coefficient variations are as follows. 13 Statistics Mean Standard deviation CV (%) Plant Height (cm) 90.00 3.60 4.00 Characteristic Grain wt/plant (gm) 20.78 7.7 37.1 Which character is showing more variability from the above data. IV. STANDARD ERROR (SE) Standard error is different from standard deviation. The purpose is to draw inference about the population from the samples. Specifically we are more interested in estimating true means of a population. The extent to which the sample mean varies from the population mean is measured by the SD among the mean. This measure is called standard error of the mean or simply standard error (SE). It is obtained by using the following formula; SE = SD /√ n where SD = standard deviation of the population and n is sample size From the earlier examples, we can compute SE as follows From Ungrouped Data N = 7; SD = 20 SE = 20 /√ 7 = 0.38 From Grouped Data with frequency N = 10; SD = 1.02 SE = 1.02/ √ 10 = 0.32 From Grouped Data with class interval N= 20; SD = 105 SE = 105 /√ 20 = 23.48