Download practical manual on statistics - College of Agriculture, OUAT

Dr. A. K. Parida, Ph.D. Associate Professor Department of Agricultural Statistics College of Agriculture (OUAT), Bhubaneswar-3 PREFACE The subject statistics has much importance for teaching, research and extension in the field of agriculture and allied science. The knowledge and expertise of the subject is immensely helpful to the teachers, scientists, students and research scholars for their area of study and application. We collect data from different sources by different methods for different purposes. As these data are random in nature, they are subjected to various manipulations to infer valid conclusions for further efficient use and correct decisions. No doubt, we can handle the voluminous data so generated for the purpose by use of computers and softwares. But, the fundamental concepts, knowledge and expertise on procedures, principles and techniques of statistics play a vital role to arrive at a valid and meaningful conclusion. This practical manual has been conceived and prepared for the students and teachers as well to acquaint the basic concepts of statistical principles and procedures of calculations as per the syllabi of 4th Dean’s committee of ICAR for under graduate courses in agriculture and allied sciences. The manuscript of this manual has been prepared with my long years of teaching expertise and persuasion from students and teachers of the university. The contents so developed have been referred and copied from many text books, journals, manuals and the internet. I acknowledge the help of those sources. I expect comments from the users of this manual for any addition or deletion and improvement in future. I wish the practical manual would be very much useful for students and research workers. I may, also, thank to the authorities for providing funds from the XIth ICAR development grant for printing the manual. Date: March 25, 2009 Amulya Kumar Parida CONTENTS Practicals Topics Page I. Statistical methods 1 1.1 Construction of Frequency Table 1 1.2 Graphical distribution frequency 4 1.3 Measures of central tendency or central value Arithmetic Mean, Geometric Mean, Harmonic Mean, Median, Mode, Quartile, Decile and Percentiles 6 1.4 Measures of dispersion of a frequency distribution Mean deviation, Standard Deviation, Variance, and Coefficient of Variation (C.V.) 13 1.4 Moments and Measure of skewness and kurtosis 17 1.5 Testing of Hypothesis or Test of Significance or decision rule 20 1.6 Standard normal deviate (SND) or Z tests or Large Sample Tests - for single mean and difference of two means 21 1.7 Small Sample Tests - test of 2 variances, test for single mean, two independent means and two dependent means 24 1.8 Chi-square test (χ2) - Goodness-of-fit independence or association of attributes and 33 1.9 Correlation and regression Pearson’s correlation coefficient and its test, Spearman's Rank correlation coefficient; fitting of regression equations of two variables Y and X 38 II. DESIGN AND ANALYSIS OF EXPERIMENTS 47 2.1 Basic concepts on design of experiments Analysis of variance : one-way and two-way classification 47 representation of Practicals Topics Page 2.2 Analysis of data in completely randomized design (CRD): unequal replications, equal replications 52 2.3 Analysis of data in randomised complete block design(RCBD) 57 2.4 Analysis of data in Latin square design (LSD) 61 2.5 Missing plot technique in design of Experiments 64 2.6 Analysis of data in RCBD with one missing observation 65 2.7 Analysis of data in LSD with one missing observation 68 III. SAMPLING TECHNIQUES 71 3.1 Principal steps in a sample survey 72 3.2 Simple random sampling (SRS): Selection of sampling units from a Population 76 3.3 Parameter estimation in SRS: SRSWOR, SRSWR 78 3.4 Stratified sampling 82 3.5 Systematic sampling 88 APPENDIX STATISTICAL TABLES (t, F, χ2, r, Z, random number) 93 Table-1(a): Critical values for t-distribution 93 Table-1(b): Critical values for t-distribution (One & Two-tailed) 93 Table-2: Critical values for F-distribution 95 Table-3: χ2 (Chi-Squared) Distribution: Critical Values of χ2 101 Table-4: Critical value for coefficients (Simple or Partial) 101 Table-5: Percentage distribution, Z points Table-6: Random numbers of Correlation the normal 102 103 UG Practical Manual on Statistics PRACTICAL MANUAL ON STATISTICS Two major practical aspects of scientific investigations are collection of data and interpretation of the collected data. The data may be generated through a sample survey on a naturally existing population or a designed experiment on a hypothetical population. The collected data are condensed and useful information extracted through techniques of statistical inference. This manual essentially deals with various statistical methods and techniques used for objectively tabulating the data, step-by step computation of data and making valid inferences out of the same which will be useful for under graduate students. General Objective: To impart knowledge to the students on basic concepts and statistical techniques applied in agriculture and allied sciences. Specific objectives: By the end of practical exercises, the students will be able to: 1. Acquaint with the practical applications of statistical techniques in agriculture. 2. Make self sufficient and to draw valid conclusion of statistical techniques. I. STATISTICAL METHODS 1.1. Construction of frequency table A frequency table is a technique which meaningfully summarizes a set of observations in a tabular form so as to bring about the essential information contained in it. A tabular arrangement of data by classes together with the corresponding class frequencies is called a frequency distribution or frequency table. There are two types of frequency table. i. Exclusive type ii.Inclusive type The frequency table of exclusive type (lower limit value is included and upper limit is excluded) is formed when the data are continuous and it is called as continuous distribution. The frequency table of inclusive type (both lower and upper limit values included) is considered when the data are discrete or discontinuous and it is called discontinuous / discrete distribution. Department of Agricultural Statistics, OUAT Page-1 UG Practical Manual on Statistics Procedure: The following steps are to be considered for constructing a frequency table from a set of data. Step-1. Determination of number of classes Usually the number of classes should be of 5 to 15 otherwise the information contained in the data may be lost. One may use the formula of Sturge’s rule for determining the number of classes, K. K= 1+3.322 log10 N where, N=No. of observations Step-2. Determination of magnitude of class interval (CI) From a given set of observations, locate the maximum (Max) and minimum (Min) value. Then, Range= Max – Min Max  Min and CI or class width (d) will be: d = K If ‘d’ have decimal value then consider the nearest integral value as class width. Step-3. Choice of class limits or class boundaries First, we should check whether the observations of the variable is a continuous or discrete type viz. height, weight, volume etc. of measurement type is a continuous variable and no. of trees, no. of students etc. of count type is discrete variables. Use exclusive method of frequency distribution if the variable is continuous otherwise inclusive method if variable is discrete. Step-4. Formation of classes: a. Exclusive method: From the first class the subsequent classes are made by adding d with both lower and upper limits, e.g. if first class is L to L+d then second class is L+d to L+2d and so on. Exa. 10 to 15, 15 to 20, 20 to 25 etc. b. Inclusive method: From the first class the subsequent classes are made by adding (d+1) instead of d to both lower and upper limits, e.g. if first class is L to L + d then second class is [L+(d+1)] to [L+(2d+1)] and so on. Exa. 10 to 15, 16 to 21, 22 to 27 etc. Step-5. Determination of Class frequency It is how frequently a value of the variable occurs in a class. The class frequencies are determined with the help of tally marks (|). Step-6. Construction of frequency distribution table Department of Agricultural Statistics, OUAT Page-2 UG Practical Manual on Statistics The frequency table has the following headings. Classes Tally mark Frequency (1) (2) (3) The classes are formed starting with the minimum value of the set of observations having each class of difference of class width(d). Then, tally marks are made under each class as per the appearance of the observations sequentially. In a class when 5th tally mark is required, either a slash(/) or overhead mark(¯) is drawn to the group of 4 tally marks. The tally mark in each class starts from the first observation till to the end of data. Then the tally marks are counted as frequency of the class in the last column. Problem-1. Construct the frequency distribution table with the following 30 observations. 10(Min),15,17,20,21,16,17,18,20,31,35(Max),13,12,15,14,12,15,17,14,1 3,15,14,13,14,20,19,18,28,24,25. Solution: (i). No. of Classes, K = 1 + 3.322 log10N K= 1+3.322  Log1030 = 1+3.322  1.4771 = 1+4.90=5.90  6. Max  Min (ii). Class size, d = K 35  10 25 d   4.16  5. 6 6 a. Exclusive method: where, N = 30 Table-1. Construction of frequency distribution table with CI=5 Class 10-15 15-20 20-25 25-30 30-35 35-40 Total Tally marks IIII IIII IIII IIII I IIII II I I Department of Agricultural Statistics, OUAT Frequency 10 11 5 2 1 1 30 Page-3 UG Practical Manual on Statistics b. Inclusive method: Table-2. Construction of frequency distribution table with CI=5 Class 10-14 15-19 20-24 25-29 30-34 35-39 Total IIII IIII IIII II I I Tally mark IIII IIII I Frequency 10 11 5 2 1 1 30 1.2. Graphical representation of frequency distribution Graphical representation of the observations facilitate to better understanding about some more depth of distribution of observations. The frequency distribution can be represented in the form of Histogram, Frequency polygon, Frequency curve and Ogive. Procedure: a. Histogram: Histogram is a set of vertical bars in a 2-dimensional graph whose areas are proportional to the frequency of the class. It can be drawn by taking classes in X-axis and drawing bars of corresponding class frequencies in the Y-axis. b. Frequency polygon: It is made by joining straight lines with the mid points of each bars of the Histogram. c. Frequency curve: A Frequency curve is a graphical representation of frequencies corresponding to their variate values by a smooth hand curve. Frequency curve is made when the CI of each class is small so as to draw a smooth hand curve. It can be drawn by smooth hand joining of mid points of frequency polygon. d. Ogive: It is a graph plotted for the variate values and their corresponding cumulative frequency of a frequency distribution. Its shape is just like elongated “S”. An Ogive is prepared by using ‘more than type’ or ‘less than type’ or both of cumulative frequencies. The above graphical representation of frequency data is easily made with exclusive type. If a frequency table is of inclusive type, it is first made into exclusive type and then the above types of graphs are drawn. Cumulative frequency is the systematic sum of frequencies of each class in downward (less than type) and upward (more than type) in the classes of frequency table. Department of Agricultural Statistics, OUAT Page-4 UG Practical Manual on Statistics Problem–2. Construct the Histogram, Frequency Polygon, Frequency curve and Ogive of the following frequency distribution on the length of 60 sorghum ear heads (cm). Class (Length) No. of ear head : 18-20 21-23 : 4 10 24-26 27-29 30-32 33-35 14 16 10 4 36-38 2 Solution: As the given frequency table is of inclusive type, the classes of exclusive type is to be made for continuity of classes and then the both type of cumulative frequencies are to be computed. Table-3. Cumulative frequency table Class 18-20 21-23 24-26 27-29 30-32 33-35 36-38 Exclusive Class Mid value Frequency 17.5-20.5 20.5-23.5 23.5-26.5 26.5-29.5 29.5-32.5 32.5-35.5 35.5-38.5 19 22 25 28 31 34 37 4 10 14 16 10 4 2 Cumulative Frequency Less than Greater than 4 60 14 56 28 46 44 32 54 16 58 6 60 2 Fig. 1. HISTOGRAM Fig. 2. FREQUENCY POLYGON Fig. 3. FREQUENCY CURVE Fig. 4. OGIVE(1-less type, 2-more type) Department of Agricultural Statistics, OUAT Page-5 UG Practical Manual on Statistics Exercise: Construct a frequency distribution table, histogram, frequency polygon, frequency curve and ogive for the following data and interpret the results. 25, 32, 45, 8, 24, 42, 22, 12, 9, 15, 26, 35, 23, 41, 47, 18, 44, 37, 27, 46, 38, 24, 43,46, 10, 21, 36, 45, 22, 18. 1.3. Measures of central tendency or central value Central tendency or central value is the property of the distribution of data where we compute a central value which represents all other values. It is commonly measured by the Arithmetic Mean (or Mean), Geometric Mean, Harmonic Mean, Median and Mode. Procedure: Mean or Arithmetic Mean (A.M) The arithmetic mean is the sum of observations divided by the total number of observations. i. For a series of data: If the series have ‘n’ values of a variable ‘X’, i.e. x, x2,………….., x n, the Arithmetic Mean (A.M) is given by: x1  x2  ........................  xn Sum of values  n No. of Values A.M  n X  x i 1 i n ii. For ungrouped frequency distribution: Suppose the values x 1, x2…………………..,xn occur with frequencies n f1,f2,………………, fn, then A.M. is given by: X   f .x i 1 i N i n , N   fi i iii. For grouped frequency distribution: If data are grouped according to different class intervals, the mid value of each class is taken as an approximation to the value of the variable representing that class. If m1, m2………… …… mn represents the mid values of ‘n’ classes of the variable ‘X’ and f1, f2,……..,fn represents the corresponding frequencies, the Arithmetic Mean of x is Department of Agricultural Statistics, OUAT Page-6 UG Practical Manual on Statistics n X f m i i 1 n f i 1 i i a). Short-cut method (or change of origin): If di = (xi - A), A= any arbitrary value(called origin), then n X  A  f .d i i 1 f i i i b). Step-deviation method (or change of origin and scale): x A If u i   i where, A = any arbitrary value(called origin),   h  h = magnitude of class interval (or scale), then h n X  A   fiu i N i 1 Geometric Mean (G.M.) Geometric mean is the ‘n-th’ root of the product of all ‘n’ values. i. For a series of data: If the values of the variable are x1, x2,…xn, then the Geometric mean of ‘x’ is: 1/ n G   x1. x2 ............xn  Alternatively, log10 G  1 n  log10 xi n x 1 ' or ' 1 n  G  Anti log  log10 xi   n x 1  ii. For ungrouped frequency distribution: If the values x1, x2………. xn occur with frequencies f1,f2….fn respectively, then 1  1 n  f1 f2 fn f 1  f 2  ...........  f n f i log 10 x i  G  ( x1 x2 ............xn ) or G  Anti log    N x 1  N = f1  f2  ...........  fn iii. For Grouped frequency distribution: G  (m 1 m 2 f1 f2 fn ..............m n ) 1 f1  f 2 ............. f n Department of Agricultural Statistics, OUAT  1 or G  Anti log  N n  x 1  f i log 10 m i   Page-7 UG Practical Manual on Statistics N= f1  f2  ...........  fn and m1, m2……….. mn are mid-values of the classes. Harmonic mean (H.M.) The Harmonic Mean is the reciprocal of the mean of reciprocal of the observations. i. For a series of data: If x1, x2…….xn are values of a given variable, then the Harmonic Mean is: H .M  1 1 1 1 1    ........   n  x1 x 2 xn   n 1    i 1  xi  n ii. For ungrouped frequency distribution: If x1, x2,…………,xn occur with the frequencies f1,f2,……..,fn respectively, then, H .M  f i f1 f 2 f  ...........  n x1 x2 xn f ( f x )  i n i i i iii. For grouped frequency distribution: HM   f , where,m , m   f m  i 1 2 ,......., mn are mid  values of the classes. i i Problem-3. The frequency distribution of weight(g) of 180 sorghum earheads is given in the following table. Calculate the A.M., G.M and H.M. Table-4. Frequency distribution of sorghum ear heads Weight of ear head in gm (X) 40-60 60-80 80-100 100-120 120-140 140-160 160-180 180-200 Total Department of Agricultural Statistics, OUAT No. of ear heads (f) 6 28 35 50 30 10 12 9 180 Page-8 UG Practical Manual on Statistics Solution: Table-5. Computation of mean (A.M.) by direct method, short-cut method and step-deviation method Mid value (mi) Class (X) 40-60 60-80 80-100 100-120 120-140 140-160 160-180 180-200 Total 50 70 90 110 130 150 170 190 fi fi mi A 6 28 35 50 30 10 12 9 N=180 300 1960 3150 5500 3900 1500 2040 1710 fimi = 20060 110 - ui= (m i  A) h di fidi -60 -40 -20 0 20 40 60 80 - -360 -1120 -700 0 600 400 720 720  fidi = 260 -3 -2 -1 0 1 2 3 4 - fi ui -18 -56 -35 0 30 20 36 36 fiui = 13 The mean weight of ear head is given by: n i. Direct method: X  f m i i 1 i N   20060 180   111.44g n ii. Short-cut method : X  A  f d i 1 i N i  110  260  110  1.44  111.44g 180 iii. Step-deviation method: X A Table-6. h n 20 f i u i  110   13  110  1.44  111.44g  N i 1 180 Computation of Geometric mean (G.M.) Class Mid value Frequency (x) mi fi 40-60 60-80 80-100 100-120 120-140 50 70 90 110 130 6 28 35 50 30 Department of Agricultural Statistics, OUAT Log10mi fi  log10mi 1.69 1.84 1.95 2.04 2.11 10.14 51.52 68.25 102.00 63.30 Page-9 UG Practical Manual on Statistics 140-160 160-180 180-200 Total 150 170 190 10 12 9 180 2.17 2.23 2.27 - 21.70 26.76 20.43 364.1 n Log G  f. i 1 i n f i 1 Table-7. log mi  364.1  2.02 ; G  Ant log(2.02)  104.71g 180 i Computation of Harmonic Mean (H.M.) Class (x) 40-60 60-80 80-100 100-120 120-140 140-160 160-180 180-200 Total Mid values mi 50 70 90 110 130 150 170 190 - Harmonic mean (H.M.) = Frequency fi 6 28 35 50 30 10 12 9 N=180 f f  m i i  fi/mi 0.12 0.4 0.38 0.45 0.23 0.06 0.07 0.04 (fi/ mi)= 1.75 180  102.85 g 1.75 i Conclusion: From the above calculation the Arithmetic Mean (A.M.), Geometric Mean (G.M.), and Harmanic Mean (H.M.) of weight of sorghum ear-heads are 111.44g, 104.71g, and 102.85g respectively. And the relation obtained is A.M. > G.M. > H.M. Note: The relation may be A.M. ≥ G.M. ≥ H.M. Median, Quartile, Decile and Percentiles In a frequency distribution (arranged in increasing or decreasing order), median is that value where half of the observation would be above the value and half below it. Similarly Quartiles, Deciles and Percentiles are those values of the variate which divide the total frequencies into 4 parts, 10 parts and 100 parts equally respectively. Procedure: Prepare a cumulative frequency table and then calculate i.N/4, i.N/10, i.N/100 to find out the ith Quartile class, ith Decile class, ith Percentile class respectively. In case of Quartiles, i=1,2,3; in Decile, i=1,2,……,9 and in case of Percentile, i=1,2,…….,99. Department of Agricultural Statistics, OUAT Page-10 UG Practical Manual on Statistics   h N i( )  c.f ) x fi where, L0= Lower limit of the : ith Quartile class in case of ith Quartile : i th Decile class in case of ith Decile : ith Percentile class in case of ith Percentile h = Width of the frequency distribution class fi =Frequency of the i th Quartile or ith Decile or ith Percentile class N =Total frequency = ( fi) c.f = Less than cumulative frequency preceding the ith Quartile or ith Decile or ith Percentile class x=4 or 10 or 100 for Quartiles, Deciles and Percentiles, respectively. Formula: C T  L o  How to find a quartile/decile/percentile class? In a frequency table, to find out the ith Quartile class/ith Decile class/ith Percentile class compute the i.N/4 or i.N/10 or i.N/100 respectively. Then locate the respective class in the table whose corresponding c.f. is more than these values. In case of Quartiles, i=1,2,3; in Decile, i=1,2,……,9 and in case of Percentile, i=1,2,…….,99. Problem-4. Find the Median (2nd Quartile); lower Quartile(1st Quartile), 7th Decile and 85th Percentile of the frequency distribution given below: Marks in below statistics 10 No. of 8 students 10-20 20-30 30-40 40-50 50-60 60-70 12 20 32 30 28 12 above 70 4 Solution: Table-8. (i) Computation of Median, Quartile, Decile and Percentile Marks in Statistics (X) No. of Students (fi) Less than Cumulative frequency (c.f) <10 10-20 20-30 30-40 40-50 50-60 60-70 >70 8 12 20 32 30 28 12 4 8 20 40 72 102 130 142 146=N Median = 2nd quartile denoted by Q2 i.e. i=2  2  N 146  So, for i=2, i.N/4=     73 2   4 Department of Agricultural Statistics, OUAT Page-11 UG Practical Manual on Statistics Hence Median Class is 40-50 corresponding to c.f.=102 which is h >73. Median = L0 + (N/2-c.f) fi 10 = 40 + (73 - 72)= 40+0.33= 40.33 30 (ii) First Quartile = Q1 Here, i=1 146 So, for i=1, i.N/4 = (1 x N/ 4 = )=36.5 4 Hence Q1 Class is 20-30 corresponding to c.f.=40 which is >36.5. h Q1 = L0 + (N/4 - c.f.) fi 10 = 20 + (36.5  20)  20  8.25  28.25 20 (iii) Seventh Decile = D7 Here, i=7 7  146 So, for i=7, i.N/4= (7  N / 10)  )  102.2 10 And 7th decile class is 50-60. h D7  Lo  7.N / 10  c. f . fi  50  (iv) 10 (102.2  102.0)  50  0.07  50.07 28 85th Percentile = P85 Here, i=85 85  146   So, for i=85, i.N/4=  (85  N 100)   =124.1 100   And 85th Percentile class is 50-60. 10 P85  50  (124.1  102)  50  7.89  57.89 28 Mode of a frequency distribution The Mode is the value of the variate which occurs most frequently in the data set. In a frequency table the Modal class is the class which has greatest frequency. Procedure: i. For a series or ungrouped data: The observation which have the highest frequency i.e. the value which occurs maximum times is the mode. ii. For grouped data: Formula: Mode ( M O )  Lo  f  fp h (2 f  fp  f s ) Where, L0 = Lower limit of the modal class Department of Agricultural Statistics, OUAT Page-12 UG Practical Manual on Statistics f = frequency of the modal class fp = frequency preceeding the modal class fs = frequency succeeding the modal class h = width of the frequency distribution class Note: The class which has highest frequency is the modal class Problem-5. Compute the Modal value of the wages of workers in a farm from the following frequency distribution. Wages (Rs.) 30-35 35-40 40-45 45-50 50-55 55-60 60-65 65-70 No. of workers 12 18 22 27 17 23 29 8 Solution: Modal class = Maximum frequency(=29) class i.e. 60-65 ( f  fp) Mode = L0  xh (2 f  f p  f s ) L0 = lower limit of modal class = 60 f = frequency of modal class = 29 fp = frequency of the preceeding modal class = 23 fs = frequency of the succeeding modal class = 8 h = class size = 5 Mode = 60  (29  23)  5 6  60   5  60  1.11  61.11 (2  29  23  8) 27 1.4. Measures of dispersion of a frequency distribution Literal meaning of dispersion is scatterdness. We study dispersion to have an idea about the homogeneity or heterogeneity of the distribution i.e. the scatterdness of observations from a central value. There are several measures of dispersion and each provides specific information concerning the scatter or dispersion of values in a distribution. Measure of mean along with dispersion gives some more information about the data. The measures of dispersion are Range, Quartile Deviation, Mean Deviation, Standard Deviation, Variance and Coefficient of Variation. Department of Agricultural Statistics, OUAT Page-13 UG Practical Manual on Statistics Mean deviation from a particular value ‘A’ (Mean or Median or Mode) of a frequency distribution Procedure: Mean deviation is defined as the arithmetic mean of the absolute deviations of the variate values from a particular measure of location. This mean deviation may be about Mean, about Median or about Mode. In a frequency distribution, 1 n M.D.   f i x i  A N i 1 where, x1, x2,…………., xn are values of classes or mid-values of the classes with frequencies f1,f2,………..,fn. N= Total frequency = n f i 1 i A= either Mean or Median or Mode Problem-6. Compute the Mean Deviation from the Mean from the following data. Wages (Rs.) 60-70 50-60 40-50 30-40 20-30 Number of labourers 5 10 20 8 3 Solution: Table-9. Computation of Mean Deviation from Mean Wages (Rs.) 60-70 50-60 40-50 30-40 20-30 Total Mean=  Mid Values (xi) 65 55 45 35 25  fx f i i i  Number of labourers (fi) 5 10 20 8 3 46 f i xi 325 550 900 280 75 2130 |d| = |x-mean| 18.70 8.70 1.30 11.30 21.30 - f |d| 93.50 87.00 26.00 90.40 63.90 360.80 2130  46.30 46 Mean Deviation from mean  f d f Department of Agricultural Statistics, OUAT  360.80  7.843 46 Page-14 UG Practical Manual on Statistics Standard Deviation, Variance and Coefficient of Variation (C.V.) Procedure: The arithmetic mean of the squares of the deviation of the variate values from their arithmetic mean is defined as the Variance. The positive square root of the Variance is called the Standard Deviation (S.D.). Coefficient of Variation (C.V.) is the relative magnitude of Variation, based on observations relative to the magnitude of their arithmetic mean. It is defined as the ratio of standard deviation to arithmetic mean expressed as percentage. There are two methods for calculation of Standard deviation: i). Direct method ii). Short-cut method (by changing of origin and scale) i. Direct method: Step 1 : Calculate mid value (xi) for group data Step 2 : Calculate fi.xi of each class and finally  fi.xi Step 3 : Calculate xi2 and fi.xi2 and finally  fi.xi2 Step 4 : Calculate S.D. (  ) by using the formula f S.D.=  =+ i . xi   fi . xi   N  2 N f  2   , Where, N   f i     fi xi   and Variance,  2   N  N   ii. Short-cut Method or Step deviation method: Step 1 : Calculate the mid value (xi) for group data Step 2 : Calculate deviation value (di), where x A where, A=any arbitrary value or mean, c=class size di  i c 2 Step 3: Calculate, f i . d i  and f i . d 2 i and finally  f i d i and  f i . d i i .xi 2 2   Step 4: Calculate S.D. by using formula S.D=   c  f .d i 2 i N  f d   2 i. i N  fd2  fd  i i  i i So, Variance =   c   N  N   2 2 Department of Agricultural Statistics, OUAT     2     Page-15 UG Practical Manual on Statistics    S.D.  Coefficient of Variation, C.V.=    100    100 X  Mean  Standard deviation is an absolute measure of dispersion whereas C.V. is a relative measure of dispersion expressed in percentage for comparing two or more data sets. Problem-7. Compute the Standard Deviation, Variance and C.V. from the following data. Size of the holding (ha) 2.5-3.5 3.5-4.5 4.5-5.5 5.5-6.5 6.5-7.5 7.5-8.5 8.5-9.5 Solution: Table-10. No. of farmers 1000 2300 3600 2400 1700 3000 500 Calculation table for Standard Deviation Size of holding (ha.) 2.5-3.5 3.5-4.5 4.5-5.5 5.5-6.5 6.5-7.5 7.5-8.5 8.5-9.5 Total Mid value (xi) 3 4 5 6 7 8 9 (fi) fi .xi fi .xi 2 10,00 3000 9000 2300 9200 36,800 3600 180,00 90,000 2400 14400 86400 1700 11900 83,300 3000 24000 19200 500 4500 40,500 14,500 85,000 5,38,000 di=(xi -A) for A=6 -3 -2 -1 0 1 2 3 fi.di fi.di2 -3000 -4600 -3600 0 1700 6000 1500 -2000 9000 9200 3600 0 1700 12000 4500 40,000 a). Direct method: S.D= f i .xi 2 N   f i .x i   N      2 2 538000  85000  =    37.103  34.362  1.65 14500  14500  b). Step Deviation Method: i. S.D =  c  = 1 f i .di 2 N 40,000   2000    14,500  14500    f .di  2 i N 2 Department of Agricultural Statistics, OUAT Page-16 UG Practical Manual on Statistics = 2.758  0.019 = ii. Variance = S.D  2.739 1.655 = 1.655 = 2.739 S.D. iii. Coefficient of Variation, C.V. =  100 Mean  f i . x i = 85000  5.862 Here, Mean = 14,500  fi  C.V  2 2 S.D 1.655  100   100 Mean 5.862  28.23% Moments, skewness and kurtosis First four moments about mean of a frequency distribution Procedure: Generally there are two types of moments. 1).Moments about mean (  r ) r   f (x  x) f i r i i 2).Moments about origin (  'r ) r 1  f .d  f i r where, d i  x i  A and A=any arbitrary value i i By step deviation method r h r  fidi x A r  ( Where, d i  i )  fi h Moments about mean are: 1  0 '  2   2 '( 1 ) 2 3  3  3 2 1 '  2(  ' 1 )3 ' '    4   4  4 '3 1  6 2 ( 1 )  3  '1 ' ' ' ' 2 4 Measure of Skewness and Kurtosis for a frequency distribution Skewness is defined as lack of symmetry from mid value. Measures of Skewness signify the direction and extent of Skewness (skewed to left or right). There are two methods to find out Measure of Skewness from a given frequency table. First method – Karl Pearson coefficient of Skewness Step-1. Find out Mean, Mode and S.D. Department of Agricultural Statistics, OUAT Page-17 UG Practical Manual on Statistics Step-2. Calculate measure of Skewness by using the formula given by Karl Pearson, Sk  Mean  Mode S.D Second method - For wide class of frequency distribution Step-1. Step-2. Find 2nd and 3rd moments about mean Calculate measure of Skewness, 3 3 2 2  1   1   f i (x i  x) 2  f i ( xi  x )3 Where,  2  , 3   fi  fi If 1 =0 or  1 =0, indicates the distribution is symmetrical otherwise skewed to left or right as per the sign of 3 -ve or +ve. Kurtosis is a measure of the peakedness or flatness of a curve of a distribution. Kurtosis is of three types - Platykurtic, Leptokurtic and Mesokurtic. Kurtosis can be computed by the following steps. Step-1.Find out 2nd and 4th moments about the mean of distribution Step-2.Calculate Kurtosis as,   2  42 or  2   2  3 2  4  4th central moment about mean Where,  2  2nd central moment about mean If  2 = 3 or  2 =0, indicates the distribution is normal i.e. mesokurtic  2 >3 or  2 >0, indicates the distribution is more peaked i.e. leptokurtic  2 <3 or  2 <0, indicates the distribution is more flattened i.e. platykurtic Problem-8. Calculate the four moments about mean and find out the measures of Skewness & Kurtosis from the following table. Class Interval Frequency 10-20 20-30 30-40 40-50 50-60 60-70 70-80 3 7 4 14 8 6 3 Solution: Department of Agricultural Statistics, OUAT Page-18 UG Practical Manual on Statistics Table-11. Calculation of moments Class interval Frequency (fi) 10-20 20-30 30-40 40-50 50-60 60-70 70-80 Total 3 7 4 14 8 6 3 45 Mid value (xi) 15 25 35 45 55 65 75 di= ( x i  A) h -3 -2 -1 0 1 2 3 2 fidi fidi -9 -14 -4 0 8 12 9 2 27 28 4 0 8 24 27 118 fidi 3 -81 -56 -4 0 8 48 81 -4 fidi 4 243 112 4 0 8 96 243 706 From the table,  '1  h  fi .d i 2  10   0.44  fi 42 '2  h 2  fi .d i 118  100   262.22  fi 45  '3  h 3  fi .d i 4  1000   88.88  fi 45 '4  h 4  fi .d i 706  10,000   156888.88  fi 45 2 3 4   2   2  (1 ) 2  262.22  (0.44) 2  262.02 ' ' 3   '3  31 .2  2( 1' )3 ' '1  88.88  3(0.44)(262.220  2  (0.44)3  435.01  0.170  434.83 ' ' 2 '  ' 4   ' 4  6 3 . ' 4  4 2  ' 4  3(1 ) 4  156888.88  6  (88.55) (0.44)  4  262.22  (0.44) 2  3  (.44) 4  156888.88  234.64  203.06  0.11  4  157326.47 So, Skewness =  1 = ( 1 )   23 (434.83) 2 189077.12    0.10 3 3 179888.46  2 (262.02) Department of Agricultural Statistics, OUAT Page-19 UG Practical Manual on Statistics Kurtosis =  2  4 157326.47 157326.47    2.29 2 ( 2 ) (262.02) 2 68654.48 By moment method Skewness and Kurtosis of the given distribution are 0.10 and 2.29 respectively. So, it is concluded that the distribution of the data is not symmetrical i.e. skewed to the left as  1 =0.10 and the sign of 3 is –ve. Again the distribution is also not normal i.e. less peaked(platykurtic) as  2 is less than 3,i.e.,  2 =2.29. Exercise: The following are the 405 soybean plant heights collected from a particular plot. Plant height (cm.) No. of plants( f i ) 812 6 1317 17 1822 25 2327 86 2832 125 3337 77 3842 55 4347 9 4852 4 5357 1 Compute: i).A.M., G.M., H.M., Median, Mode ii). Mean Deviation from mean, S.D., Variance, C.V. iii). Coefficient of Skewness and Kurtosis iv). Interpret the results of above for soyabean 1.5. Testing of Hypothesis or Test of Significance or decision rule The estimate based on sample values do not equal to the true value in the population due to inherent variation in the population. The samples drawn will have different estimates compared to the true value. It has to be verified that whether the difference between the sample estimate and the population value is due to sampling fluctuation or real difference. If the difference is due to sampling fluctuation only it can be safely said that the sample belongs to the population under question and if the difference is real we have every reason to believe that sample may not belong to the population under question. Steps involved in test of hypothesis: 1) 2) 3) 4) The null and alternative hypothesis will be formulated Test statistic will be constructed Level of significance will be fixed The table (critical) values will be found out from the tables for a given level of significance 5) The null hypothesis will be rejected at the given level of significance if the value of test statistic is greater than or equal to the critical Department of Agricultural Statistics, OUAT Page-20 UG Practical Manual on Statistics value. Otherwise null hypothesis will be accepted. 6) In the case of rejection the variation in the estimates will be called “ significant‟ variation. In the case of acceptance the variation in the estimates will be called “not- significant‟. 1.6. Standard normal deviate (SND) or Z tests or Large Sample Tests If the sample size n ≥ 30 then it is considered as large sample and if the sample size n< 30 then it is considered as small sample and accordingly there are large sample and small sample tests. SND Test or One Sample (Z-test) for single mean Case-I: Population standard deviation () is known Assumptions: 1. 2. Population is normally distributed The sample is drawn at random Conditions: 1. 2. Population standard deviation  is known Size of the sample is large (n > 30) Procedure: Let x1,x2, ………xn be a random sample size of n from a normal population with mean μ and variance 2. Let x be the sample mean of sample of size ‘n’ Null Hypothesis is H0 : μ = μ0 (a specified value) and alterative is H1: μ ≠μ0 (two-tail) Under H0, the test statistic is Z= x  0 / n ~ N(0,1) i.e. the above statistic follows Normal Distribution with mean μ0 and varaince ‟1‟. If Zcal ≤ Z tab at 5% level of significance, H0 is accepted and hence we conclude that there is no significant d i f f e r e n ce between the population mean and the one specified in H0 as μ0. Problem-9. A sample of 900 leaves has a mean of 3.4 cms and S.D. of 2.61 cms. Is the sample drawn from a large population of mean 3.25 cms? Solution: Here, Null Hypothesis is H0 : μ = μ0 and altenative is H1: μ ≠μ0 (two-tail) Department of Agricultural Statistics, OUAT Page-21 UG Practical Manual on Statistics Given x =3.4, μ0=3.25, σ=2.61 and n=900 Putting the values in the formula, we get Z=1.73 The tabulated value of Z at 5% is 1.96. So, Z calculated is less than tabulated. Hence, H0 is accepted i.e. the sample drawn is from a large population of mean 3.25 cms. Exercise: A herd of 1500 steer was fed a special high-protein grain for a month. A random sample of 29 was weighed and had gained an average of 6.7 kgs. If the standard deviation of weight gain for the entire herd is 7.1kgs., test the hypothesis that the average weight gain per steer for the month was more than 5 kgs. (Hints: H 0: μ = 5 H 1: μ > 5, Zcal=1.289) Case-II: If  is not known Null hypothesis (H0) :  = 0 under H0, the test statistic Z= | x  0 | s/ n ~ N(0,1) 1 Where, s= [ ( x 2 )  ( x / n) 2 )] and x’s are sample observations. n If Zcal ≤ Z tab at 5% level of significance, H0 is accepted and hence we conclude that there is no significant difference between the population mean and the one specified in H0 otherwise we do not accept H0. The table below gives some critical values of Z  as: Level of significance 10% 5% 1% Critical value of Z  Two-tail 1.645 1.96 2.58 One-tail 1.28 1.645 2.33 SND test for two sample means or Z-test of significance for difference of two means Case-I: when σ is known Procedure: Let x1 be the mean of a random sample of size n1 from a population with mean μ1and variance σ12 and let x2 be the mean of a random sample of size n2 from another population with mean μ2 and variance Department of Agricultural Statistics, OUAT Page-22 UG Practical Manual on Statistics σ22. The hypothesis is, H0: μ1= μ2 and H1: μ1≠ μ2(two-tail) i.e. the null hypothesis states that the population means of the two samples are identical. Under the null hypothesis the test statistic becomes | x1  x 2 | Z= ~N(0,1)  12  22  n1 n2 i.e the above statistic follows Normal Distribution with mean “0‟ and variance ‟1‟. 2 2 If σ =σ = σ2 (say) i.e. both samples have the same standard 1 2 deviation(or variance), then the test statistic becomes | x1  x 2 | Z= ~N(0,1) 1 1   n1 n2 If Zcal ≤ Z tab at 5% level of significance, H0 is accepted otherwise rejected. If H0 is accepted means, there is no significant difference between two population means of the two samples and means are identical. Problem-10. The Average panicle length of 60 paddy plants in field No.1 is 18.5 cm and that of 70 paddy plants in field No.2 is 2 0 . 3 cm. with common S.D. o f 1.15 cm. Test whether there is significant difference between two paddy fields w.r.t. mean of panicle length. Solution: Hypothesis, H0: There is no significant difference between the means of two paddy fields w.r.t. panicle length, i.e. μ1=μ2 Under H0, the test statistic becomes Z= 1  2 ~N(0,1) where, x1 =18.5, x2 =20.3 n1=60, n2=70, σ=1.15 Substituting the given values in the formula, we get Z=8.89 Conclusion: So, at 5% level of significance 8.89 > 1.96(table value) and hence H0 is rejected means there is significant difference between mean panicle lengths of the two paddy populations in regard to panicle length. Department of Agricultural Statistics, OUAT Page-23 UG Practical Manual on Statistics Example: The amount of a certain trace element in blood is known to vary with a standard deviation of 14.1 ppm (parts per million) for male blood donors and 9.5 ppm for female donors. Random samples of 75 male and 50 female donors yield concentration means of 28 and 33 ppm, respectively. Test whether the population means of concentrations of the element are the same for men and women assuming unequal variance? (Hints: H 0: μ1 = μ2 H1 : μ1 ≠ μ 2 Zcal=-2.37) Case-II: when S.D. of both populations not known The above methods are followed only after estimating the S.D. of the two populations from the sample observations as: S1= [ 1 2 ( x1 )  ( x1 / n1 ) 2 )] n1 S2= [ 1 2 ( x2 )  ( x2 / n2 ) 2 )] n2 Where x1 and x2 are the independent sample observations with sizes n1 and n2 from the two normal populations respectively. The pooled variance (S2) or S.D.(S) is computed as: S2= Problem-11. A breeder wants to investigate whether the number of filled grains per panicle is the same in a new variety of paddy ACM.5 and an old variety ADT.36. To verify a random sample of 50 plants of ACM.5 and 60 plants of ADT.36 were selected from the experimental fields. The following results were obtained: ForACM.5 For ADT.36 Mean=139.4 Mean=112.9 S1=26.864 S2=20.1096 N1=50 N2=60 Test whether the claim of the breeder is correct. Solution: The hypothesis is, H0: μ1= μ2 and H1: μ1≠ μ2(two-tail) Assuming that the two population variances are unequal put the given values in the formula Z= | x1  x 2 |  12  22  n1 n2 = 4.76 Calculated value of Z > Table value of Z at 5% ls (=1.96), H0 is rejected. We conclude that the number of filled grains per panicle is significantly different in the two verities ACM.5 and ADT.36. 1.7. Small Sample Tests Department of Agricultural Statistics, OUAT Page-24 UG Practical Manual on Statistics It is applicable when the sample size n<30. Test of hypothesis on equality of two variances (Snedecor’s F-test or variance ratio test) Let x1, x2,…,xn1 of size n1 be a sample drawn from a normal population with variance x2 and y1, y2,….,yn2 be another sample of size n2 drawn independently from a normal population with variance y2 for the same variable under study. Now we are interested to know whether two samples are drawn from two different normal populations or they belong to same normal population w.r.t. variance or scatterdness of the observations. Procedure: Step-1. The Assumptions in F-test: i. Parent population must be normal. ii. Samples are independent. Step-2. Take the null hypothesis H 0 :  2 x   2 y against Alternate hypothesis H 1 :  2 x   2 y Step-3. Choose the level of significance i.e 5% or 1%. Step-4. Choose the location of Critical region i.e one tailed or two tailed test. Step-5. Compute the observed value of F as: 2 S F  x2 with (n1  1) and (n2  1)d . f .if S 2 x  S 2 y (Greater value is taken in the numerator ) S y Where, S x  2 ( xi  x ) 2 2 ( yi  y ) 2 S y n1  1 n2  1 Step-6. Compare the observed value with tabular value. Step-7. If Fcal > Ftab then null hypothesis rejected and significant. Fcal≤ Ftab then null hypothesis accepted and it is not significant. Problem-12. Two independent samples on dry weight(g) of plants were observed from two populations as: Sample–1 (x): 39, 41, 43, 41, 45, 39, 42, 44 Sample–2 (y): 40, 42, 40, 44, 39, 38, 40 Does the estimate of the population variances differ significantly? Solution: Department of Agricultural Statistics, OUAT Page-25 UG Practical Manual on Statistics The Hypothesis is: H 0 :  2 x   2 y (take the hypothesis that the population have same var iances) H1 :  2 x   2 y Level of significance, = 0.05 Test Statistics, F  Sx ( xi  x ) 2 ( yi  y ) 2 2 2 where , S  and S  x y 2 n1  1 n2  1 Sy 2 Table-12. Calculation of variances Obs. No. x y (x  x) ( y  y) (x  x) 2 ( y  y) 2 1 39 40 -2.75 -0.42 7.5625 0.1764 2 41 42 -0.75 1.58 0.5625 2.4964 3 43 40 1.25 -0.42 1.5625 0.1764 4 41 44 -0.75 3.58 0.5625 12.8164 5 45 39 3.25 -1.42 10.5625 2.0164 6 39 38 -2.75 -2.42 7.5625 5.8564 7 42 40 0.25 0.42 0.0625 0.1764 8 44 - 2.25 - 5.0625 - - - 33.5 23.7148 Total x   xi Sx 2 n1 334 283  334  41.75, y   y i  283  40.42 8 n2 7 ( x i  x ) 2 33.5 ( y i  y) 2 23.7148 2    4.782969 , S y    3.952144 n1  1 7 n2 1 6 F  Sx 2 S 2  y 4.782969  1.210 3.952144 As n1=8 and n2=7, so for 7 and 6 degree of freedom at  = 0.05 the critical value of ‘F’ is 3.97. Since, the calculated value of F=1.21 is less than the critical value(=3.97) the H0 is accepted i.e. the estimate of the population variances does not differ significantly. It is concluded that the two samples have been drawn from the same population or the variances of the two populations are same. Test for single mean (Student’s t-test) Department of Agricultural Statistics, OUAT Page-26 UG Practical Manual on Statistics This test is used to test if the sample mean ( x ) differ significantly from the hypothetical value of the population mean 0. Procedure: Step-1. Let x1, x2, …xn be a random sample of size n drawn from a population with following assumptions : i. Parent population must be normal. ii. The sample is random. iii. The population Standard deviation is normal. iv. The sample size must be <30. Step-2. Take Null hypothesis H O :    o Alternate hypothesis H 1 :    o Step-3. Level of Significance as 5% or 1% Step-4. Choose the location of ritical Region i.e one tailed or two tailed. Step-5. Compute the sample statistic (observed) of student t-test. x  0 with (n-1) degrees of freedom t s n Where, x  Sample mean   Specified Population mean 0 s  Sample S tan dard deviation n i .e .s   (x i  x)2 i n1 Step-6. Compare the sample statistic with tabulated value. Step-7. Decision Rule i. ii. If t(cal) > t(tab) then Significant and Null hypotheses rejected. If t(cal) ≤ t(tab) then Not significant and Null hypothesis accepted. Problem-13. Ten animals are fed with an animal feed. The gain in wt.(kg) of animals are given below. Negative value indicates loss in weight. Test whether there is significant gain in weight as a result of consumption of that particular animal feed. Animal No. 1 2 3 4 5 6 7 8 9 10 Gain in Wt.(x) 25 10 11 13 12 8 5 13 7 -4 Solution: Department of Agricultural Statistics, OUAT Page-27 UG Practical Manual on Statistics Null hypothesis Ho :   0 (i.e. there is no gain in weight) H 1 : . 0 i.e. there is gain in weight This is a case of one tailed test. Table-13. Calculation for t-Statistic Animal No. Gain in wt.(x) (x  x) 1 2 3 4 5 6 7 8 9 10 Total 25 10 11 13 12 8 5 13 7 -4  x  100 15 0 1 3 2 -2 -5 3 -3 -14  Mean  x  and t  (x  x) 2 225 0 1 9 4 4 25 9 9 196 ( x  x ) 2  482  x 100   10 x 10 x  0 s n Where x  10, 0  0, n  10  x  x  2 s  n 1 10  0 t  4.31 7.3 10 482  7.3 9 Since the calculated t-value of 4.31 is more than the table value of t=1.833 at 5% level significance for 9 d.f. for one tail test, the null hypotheses is rejected and alternate hypothesis is accepted. So, we can conclude that there is +ve gain in wt. due to consumption of the particular feed. Exercise: A random sample of height (ft.) of 10 trees from a forest was observed. Test whether the mean height of trees of that forest is 100ft. or not at 5% level. (Hints: Calculated t=-0.62) Test for difference of two means for Independent samples (Fisher’s t-test) Department of Agricultural Statistics, OUAT Page-28 UG Practical Manual on Statistics This test is used to test the difference between two population means on the basis of two independent sample means or to test whether two samples have been drawn from the same population having same mean. Procedure: Let x1, x2, …xn1 be a random sample of size n1 drawn from a population with mean x and y1, y2, … , yn2 be another independent random sample with mean y having the following assumptions. i. ii. Parent population must be normal. Samples are random and independent of each other. Case-I: Population variance for both the samples same and unknown. Step-1. Take Null hypothesis Ho :  x   y Alternative hypothesis H 1 :  x .   y Step-3. Choose the level of significance either 5% or 1%. Step-4. Choose the location of Critical region i.e. one tailed test or two tailed test. Step-5. Compute the sample t value (calculated) on the following formula of Fisher’s- t test. t xy 1 1 s  n1 n 2 with (n1+n2–2) d.f.  ( x i  x ) 2   ( y i  y) 2 Here, s  is the estimated standard deviation n1  n 2  2 of the population Where, x  Sample mean of 1st sample, n1  no of observation of 1st sample y  Sample mean of 2nd Sample, n2  no.of observation of 2nd Sample Step-6. Compare the calculated value with table value. Step-7. If. t(cal) > t(tab) then Null hypothesis rejected and it is significant. if, t(cal) ≤ t(tab) then Null hypothesis accepted and it is not significant. Problem-14. The interest is to study the effect of two treatments A & B on the yield of a crop each of the treatments being repeated in 5 plots and the yield/plot noted below. Yield (in kg/plot) Department of Agricultural Statistics, OUAT Page-29 UG Practical Manual on Statistics Treatment-A (x) 9 10 13 11 7 Treatment-B (y) 15 10 14 15 11 x  10 y  13 Test whether the mean yield obtained as a result of these two treatments differ significantly. Solution: Step-1. Null hypothesis, Ho :  A   B (i.e no significant difference between two means) Alternate Hypothesis, H 1 :  A   B (i.e two means differ significantly ) Step-2. This is a case of two-tailed test. Step-3. The level of significance chosen is 5%. Step. 4 Table-14. Calculation for Fisher’s–t Statistic Sl. No. x y (x  x) ( y  y) (x  x) 2 ( y  y) 2 1 2 3 4 5 Total 9 10 13 11 7 50 15 10 14 15 11 65 -1 0 3 1 -3 - 2 -3 1 2 -2 - 1 0 9 1 9 20 4 9 1 4 4 22 So, x 50 65  10 , y   13 , n1  n2  5 and 5 5  x  x     y  y xy 10  13 2 s i i 2 n1  n2  2 Test Statistic, t  s 1 1  n1 n2  2.29  1 1  5 5 20  22  8  42  5.25  2.29 8 3 3   2.08 2.29  0.63 1.44 Step-5. The two tailed table value for “t” at 5% significance level with 8 d.f. is 2.306. So, calculated t is less than table value and hence the null hypothesis is accepted. It is concluded that the two treatments do not produce any significant difference in the mean yield. Exercise: To assess the effect of inoculation with mycorrhiza on the height growth of seedlings of a crop, 10 seedlings inoculated with mycorrhiza(Group-1) and another 10 seedlings without inoculation(Group2) were collected from an experiment. The height of seedlings obtained under the two groups of seedlings was: Department of Agricultural Statistics, OUAT Page-30 UG Practical Manual on Statistics Plot 1 2 3 4 5 6 7 8 9 10 Group I 23 17.4 17 20.5 22.7 24 22.5 22.7 19.4 18.8 Group II 8.5 9.6 7.7 10.1 9.7 13.2 10.3 9.1 10.5 7.4 Under the assumption of equality of variance of seedling height in the two groups, test the equality of means. (tcal=11.75) Exercise:Using the data of example of F-test, test equality of 2 means. Test for difference of two dependent sample means(paired t-test) Procedure: Let (x1, y1), (x2, y2),…,(xn, yn) be n paired observations of a sample from a population with basic assumption as follows: i. Parent population must be normal. ii. Samples are dependent and occur pair-wise. Step-1. Take Null hypothesis: H 0 :  x   y or H o : d  0 i.e. no difference Alternate hypotheses: H 1 :  x   y or H 1 : d  0 (or d  0 or d  0) Step-3. Choose the level of significance either 5% or 1%. Step-4. Choose the location of Critical region i.e. one tailed test ‘or’ two tailed test. Step-5. Compute the observed t statistic on the following formula of pair-t test: d t with (n  1) d . f . s n Where, di  xi  yi s  d 1 d 2 n  1 i.e. d  mean of ' d ' var iable) Step-6. Compare the observed value with tabular value. Step-7. If t-calculated > t-tabulated then null hypothesis rejected and it is significant otherwise null hypothesis is accepted. Problem-15. Memory capacity of 9 students was tested before and after training. Test at 5 per cent level of significance whether the training was effective from the following scores. Student Before (x) 1 2 3 4 5 6 10 15 9 3 7 12 Department of Agricultural Statistics, OUAT 7 8 16 17 9 4 Page-31 UG Practical Manual on Statistics After (y) 12 17 8 5 6 11 18 20 3 Solution: Here, marks obtained by the same batch of students in the tests are available. Hence, the marks are expected to be correlated. So, paired ttest will be appropriate. Then taking the null hypothesis that the mean of difference is zero, we can write, H 0 :  x   y , which is equivalent to test H 0 : d  0 H1 :  x   y As we are having matched pairs; we use paired ‘t’-test , which is given by t d with (n  1) d . f . S n Table-15. Calculation for paired-t Score (x) xi 10 15 9 3 7 12 16 17 4 - Student 1 2 3 4 5 6 7 8 9 Total Difference di=(xi-yi) -2 -2 1 -2 1 1 -2 -3 1 -7 di2 4 4 1 4 1 1 4 9 1 29  di  7   0.778 9 9 Here d  s Score (y) yi 12 17 8 5 6 11 18 20 3 -  (d  d )2  n 1 i d  n.d 2 n 1 2 i 29  9   0.778  2.944 1.715 9 1 2  t d  S n  0.778 0.778   1.361 1.715 0.572 9 Department of Agricultural Statistics, OUAT Page-32 UG Practical Manual on Statistics Table value of ‘t’ at 5% level for 8 df is 2.306. The calculated value is less than table value. Hence, it is not significant and the null hypothesis is accepted. Hence we can conclude that the training was not effective. Exercise: Data pertaining to organic carbon(OC) content measured at two different layers of 10 number of soil pits in a natural forest were collected to study whether the OC content is same or different as: Organic carbon (%) Soil pit 1 2 3 4 5 6 7 8 9 10 Layer (x) 1.59 1.39 1.64 1.17 1.27 1.58 1.64 1.53 1.21 1.48 1 Layer (y) 1.21 0.92 1.31 1.52 1.62 0.91 1.23 1.21 1.58 1.18 2 Analyse the data and draw your conclusion. (Hints: sd2=0.1486 tcal =1.485) 1.8. Chi-square test (χ2) Chi-square test of significance is for testing the agreements between observation and hypothesis (or expected) where the data are purely qualitative or enumerative in character. Such enumerative data are characterized by the frequency of occurrence or non-occurrence of events or attributes or categories expressed as counts or proportions or percentages. But, the expected frequency in each category should preferably be more than 5 and the total number of observations should be large, say, more than 50. χ2-test for Goodness-of-fit This involves testing of significance of difference between observed frequencies and the frequencies expected on some prior hypothesis or rule. If Oi is a set of observed frequencies and Ei is corresponding set of expected frequencies (i=1,2,…,n), the Karl Pearson’s Chi-square (χ2) is given by : χ2 = Procedure: Step-1. Follow the following assumption i. ii. iii. iv. Sample observation should be independent. Constraint on cell frequency should be linear i.e  Oi   Ei Total number of frequency should be reasonably large. No theoretical(expected) cell frequency be less than 5. Department of Agricultural Statistics, OUAT Page-33 UG Practical Manual on Statistics Step-2. Take the null hypothesis , H 0 : O i  E i Alternative hypothesis H1 : O i  E i Step-3. Choose the level of significance either  =5% or 1%. Step-4. Choose the location of critical region i.e. one tailed or two tailed Step-5. Compute the Chi-square value as per formula. Step-6. Compare the observed value with tabular value and take decision as: If χ2cal > χ2tab then null hypotheses rejected and significant at  . If χ2cal ≤ χ2tab then null hypothesis accepted and non significant at  . Problem-16. In a cross between parents of the genetic constitution AAbb and aaBB the phenotypes in F2 sample is classified as follows. AB Ab aB Ab Total 87 29 32 12 160 They are expected to occur in a 9:3:3:1 ratio. Does the segregation agree with the theoretical ratio? Solution: Ho: The Segregation agree with the theoretical ratio H1: The Segregation does not agree with the theoretical ratio. Level of Significance  = 0.05 4 (O i  E i ) 2 2 Tests Statistic is χ =  with 3 df . Ei i 1 The expected frequencies are computed on the basis of the theoretical segregation ratio 9:3:3:1. The total is 9+3+3+1=16. We expect ‘9’ out of ‘16’ to belong to AB group, that is, the probability of AB 9 is 16 9 The expected frequency of AB is therefore,  160  90 16 3 The expected frequency of, Ab is 16  160  30 3 The expected frequency of, aB is 16  160  30 1 And the expected frequency of ab is 160  10 16 Table-16. Calculation for Chi-square value Department of Agricultural Statistics, OUAT Page-34 UG Practical Manual on Statistics Observed frequency (Oi) 87 29 32 12 Pheno type AB Ab aB ab Expected frequency (Ei) 90 30 30 10 (Oi-Ei) (Oi-Ei)2 (O i  E i ) 2 Ei -3 -1 2 2 9 1 4 4 0.100 0.033 0.133 0.400 χ2 value 0.666 The calculated χ2 value is 0.666 which is less than the critical value of χ (with 3 d.f. at  =0.05 is 7.815). Therefore, the calculated χ2 value is not significant. Hence we accept the null hypothesis and conclude that the observed phenotypic ratio confirms to the theoretical segregation ratio of 9:3:3:1. 2 Exercise: Data were collected on the number of insect species from an undisturbed area of a Wildlife Sanctuary in different months to test whether there are any significant differences between the numbers of insect species found in different months. (Hints: we may state the null hypothesis as the diversity in terms of number of insect species is the same in all months and derive the expected frequencies in different months accordingly). Test the data. (Ans. χ2=134.84) Month Jan. Feb. Mar. Apr. May Jun. Jul. Aug. Sep. Oct. Nov. Dec. Total Oi 67 115 118 72 67 77 75 63 42 24 32 52 804 χ2-test of independence or association of attributes When individuals are classified simultaneously on the basis of variables or attributes or categories the resulting table of frequencies is called a (r x c) contingency table i.e. r-rows and c-columns. The χ2 test may be applied to contingency table to find out if the variables are independent or associated. Procedure: The χ2 value for this test may be obtained by two ways : i. By estimating the value of Ei (Expected frequency) from the values of Oi (Observed frequency) and applying 2 as goodness-of-fit. ii. For 2x2 contingency table 2 x 2 Contingency table Category Group I II Total 1 a b a+b 2 c d c+d Total a+c b+d N=a+b+c+d Department of Agricultural Statistics, OUAT Page-35 UG Practical Manual on Statistics The simple formula to calculate 2= (ad  bc) 2 N with 1 d.f . (a  b)(c  d )(a  c)(b  d ) Where a,b,c,d are observed cell frequencies. If any of the expected cell frequencies is less than 5, then a slightly modified formula is necessary. The corrected formula for 2x2 contingency table called Yates’ Correction for continuity is: 2 N   ad  bc   .N 2  2   (a  b)(c  d )(a  c)(b  d ) Problem-17. In a survey of fertilizer practices in India each of 323 cotton growing fields selected for survey was classified in the twin criteria of irrigation practice (irrigated or non-irrigated) and the practice of manuring (manured or un-manured) resulting in the following contingency table. Manured Un-manured Total Irrigated Non- Irrigated Total 75(a) 35(b) 110 115 (c) 98(d) 213 190 133 323 It is required to test whether the practice of irrigation and the practice of manuring are independent or related (associated). Solution: Ho: these two-factors irrigation and manuring are independent. H1: these two-factors irrigation and manuring are dependent or associated. First Method: Goodness-of-fit The expected frequencies of each cell are calculated as: The expected frequency of the cell (a) is (a  b)  (a  c) 110  190   64.7, 323 N (a  b)  (b  d ) 110  133 Cell (b) is   45.29 N 323 (c  d )  (a  c) 213  190 Cell (c) is   125.29 N 323 Cell (d) is ( c  d )( b  d ) 213  133   87 .7 N 323 The 2 is calculated using the formula Department of Agricultural Statistics, OUAT Page-36 UG Practical Manual on Statistics x  2 O i  E i  2  follows x 2 distributi on with ( 2  1)  ( 2  1)  1 d . f . Ei Table-17. Calculation of chi-square value i Irrigated 75(O1) 64.7(E1) 115(O3) 125.3 (E3) 190 Manured Un Manured Total Non irrigated 35 (O2) 45.3 (E2) 98 (04) 87.7(E4) 133 Total 110 213 323 2 The  value computed for the above table is 4 O i  E i 2 i 1 Ei   (75  64.7) 2 (35  45.3) 2 (115  125.3) 2 (98  87.7) 2     6.03  6.00 64.7 45.3 125.3 87.7 Second Method: Independence of attributes x2   (ad  bc) 2 .N (a  b)(c  d )(a  c)(b  d ) (75  98  35  115) 2  323 (95  35) (115  98) (75  115) (35  98) (3.325) 2  323 592076100  6.03  6.0  2 The  value computed for the above two methods is 6.00. Since there are only two categories, irrigation and manuring, the df for the 2 above contingency table is one. The table value of  with 1df at 5% level 2 of significance is 3.84. Here the  calculated values is higher than the table value and so the null hypothesis of independence of two factors irrigation and manuring is rejected and concluded that they are mutually related or associated. Exercise: The following table shows the result of inoculation against cholera in a group of people. Examine the effect of inoculation in controlling susceptibility to cholera. (Hints: apply Yates’ correction) Not attacked Attacked Inoculated 43 5 Not-inoculated 7 28 Department of Agricultural Statistics, OUAT Page-37 UG Practical Manual on Statistics 1.9. Correlation and regression In many natural systems, changes in one attribute are accompanied by changes in another attribute and that a definite relation exists between the two. In other words, there is a correlation between the two variables. For instance, several soil properties like nitrogen content, organic carbon content or pH are correlated and exhibit simultaneous variation. Strong correlation is found to occur between several morphometric features of a tree. In such instances, an investigator may be interested in measuring the strength of the relationship. Having made a set of paired observations (xi,yi); i = 1, ..., n, from n independent sampling units, a measure of the linear relationship between two variables can be obtained by a quantity called Pearson’s product moment correlation coefficient or simply correlation coefficient. Correlation is the study of co-variation between two variables to understand how the variables are closely related. In correlation analysis, both the variables are normally distributed and must be continuous. For discovering and measuring the magnitude and direction of relationship between two variables we use the statistical tool known as correlation coefficient and its range is -1 to +1. The + and – sign indicates the direction of relationship and the value gives the magnitude or strength between the two variables. Regression is the functional relationship between two or more variables and thereby provides a mechanism for prediction or forecasting. When the relationship between two variables is a straight line it is called simple linear regression. Karl Pearson’s correlation coefficient and its test of significance Procedure: Let (Xi,Yi); i = 1,2,3, ...n, be from n independent sampling units of 2 quantitative variables. a). Direct Method: Step-1. Construct a table for finding X2, Y2 and XY values Step-2. Calculate  X ,  XY ,  X 2 ,  Y 2 Step-3. Calculate Karl Pearson’s correlation coefficient by n  XY   X .  Y rxy = n  X 2  ( X )2 . n  Y 2  ( Y )2  b). Step deviation method (change of origin and scale): Step-1. Calculate U & V X A  Y  B Where, U   V . ; .  h   k  Department of Agricultural Statistics, OUAT Page-38 UG Practical Manual on Statistics A, B are arbitrary values from X & Y and h, k are suitable chosen scales. Step-2. Construct frequency distribution table for finding U,V, UV, U2,V2 Step-3. Calculate  U,  V,  UV,  U 2 &  V 2 Step-4. Calculate correlation coefficient by ruv    n U n  UV   U  V 2  (U ) 2  n V 2  (V ) 2 U .V  nV .V  U  nU  V  nV  2 2 2  OR 2 Where, U   U / n , V   V / n Both methods results the same value, i.e. rxy = ruv Test of correlation coefficient: Null hypothesis, H0:  =0 and Alternative, H1:  ≠0 Here  is the correlation in the population and r is the estimate of  from sample observation. Level of Significance,  =0.05 And Test statistic, t= r n2 1 r2 ~ Student’s-t distribution with (n-2) d.f. The tcal is compared with ttab. If tcal ≤ ttab, then H0 is accepted means not significant i.e. the two variables have no linear relationship (may be some other like nonlinear) and if tcal > ttab, then H1 is accepted means significant or we say the two variables are linearly related with the magnitude and direction of r. Problem-18. The following data gives the height of father and their sons in 10 families. Compute the correlation coefficient of heights and test its significance and give your conclusion. Height of father (cm) 63 69 65 67 68 69 69 70 71 71 Height of son (cm) 65 63 63 65 67 67 68 71 61 69 Solution: Department of Agricultural Statistics, OUAT Page-39 UG Practical Manual on Statistics Table-17. Calculation of correlation coefficient Ht. Of father (X) 63 69 65 67 68 (A) 69 69 70 71 71 Total=682 Ht. of Son (Y) 65 63 63 65 (B) 67 67 68 71 61 69 659 X2 Y2 U=X- V=Y- U  U2 A B V XY 3969 4225 4095 4761 3969 4347 4225 3969 4095 4489 4225 4355 4624 4489 4556 4761 4489 4623 4761 4624 4692 4900 5041 4970 5041 3721 4331 5041 4761 4899 46572 43513 44963 -5 1 -3 -1 0 1 1 2 3 3 2 0 -2 -2 0 2 2 3 6 -4 4 9 V2 0 25 0 -2 1 4 6 9 4 0 1 0 0 0 4 2 1 4 3 1 9 12 4 36 -12 9 16 12 9 16 21 60 93 a). Direct Method: rxy   n  XY   X .  Y n  X  ( X ) ) . n  Y  ( Y ) 2 2 2 2 and putting values 10  44963   682  659  10  46572  465124 . 43513 .10  434281 192 192    0.27 711 . 33 596 . 849 b). Step Deviation method: U V U   0 .2 ; V   0 .9 n n U  2 ruv     0 . 04 (V ) 2  0 . 81  UV  n U V U  nU 2  V 2  nV 2 21  10 ( 0 . 18 ) 60  10 ( 0 . 04 ). 93  10 ( 0 . 81 ) 2 19 . 2 19 . 2   0 . 27 ( 7 . 72 ).( 9 . 21 ) 71 . 102  The correlation coefficient between father and son in both methods is 0.27. Test of significance of r: Department of Agricultural Statistics, OUAT Page-40 UG Practical Manual on Statistics Putting the value of r in the formula, t= the t statistic, t= 0.27 10  2 1  (0.27) 2 r n2 1 r2 =0.79 The ttab=2.31 with 8 d.f. at 5% ls. So, tcal < ttab and H0 is accepted i.e. not significant. It is concluded that the height of father and their son is not linearly related or we will say that the height of father increase or decrease does not indicate the increase or decrease in height of son. Exercise: The data on pH and organic carbon content were measured from soil samples collected from 15 pits taken in natural forests as given: Soil Pit 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 pH(x) 5.7 6.1 5.2 5.7 5.6 5.1 5.8 5.5 5.4 5.9 5.3 5.4 5.1 5.1 5.2 Organic carbon(y) (%) 2.1 2.17 1.97 1.39 2.26 1.29 1.17 1.14 2.09 1.01 0.89 1.6 0.9 1.01 1.21 Compute a suitable statistic and test to study whether increase in ph of soil affects the organic carbon in that forest.(Hints:r=0.3541 and tcal=1.3652) Exercise: The following data contain 15 paired values of photosynthetic rate(Y) and light interception(X) observed on leaves of a particular tree species. The photosynthetic rate is dependent variable and the quantity of light is independent variable. Study the linear relationship between the two variables with test. Tree 1 2 3 4 5 6 7 8 X 0.7619 0.7684 0.7961 0.838 0.8381 0.8435 0.8599 0.9209 Y 7.58 9.46 10.76 11.51 11.68 12.68 12.76 13.73 Tree 9 10 11 12 13 14 15 X 0.9993 1.0041 1.0089 1.0137 1.0184 1.0232 1.028 Y 13.89 13.97 14.05 14.13 14.2 14.28 14.36 Spearman's Rank correlation coefficient A rank correlation is any of several statistics that measure the relationship between rankings of different ordinal variables or different rankings of the same variable, where a "ranking" is the assignment of the labels "first", "second", "third", etc. to different observations of a particular variable. Like any correlation calculation, it is appropriate for both continuous and discrete variables, including ordinal variables. A rank correlation coefficient measures the degree of similarity between two Department of Agricultural Statistics, OUAT Page-41 UG Practical Manual on Statistics rankings, and can be used to assess the significance of the relation between them. A rank correlation coefficient can measure that relationship, and the measure of significance of the rank correlation coefficient can show whether the measured relationship is small enough to likely be a coincidence. It is measured by Spearman's rank correlation coefficient or Spearman's rho denoted by the Greek letter (rho) of statistical dependence between two variables. It assesses how well the relationship between two variables can be described and lie in the interval [-1 to +1]. An increasing rank correlation coefficient implies increasing agreement between rankings. The coefficient value can be interpreted as: 1 if the agreement between the two rankings is perfect; the two rankings are the same. ii. 0 if the rankings are completely independent. iii. −1 if the disagreement between the two rankings is perfect; one ranking is the reverse of the other. i. For a sample of size n, the n raw scores or values Xi,Yi are converted to ranks xi,yi and ρ is computed. Identical values (rank ties or value duplicates) are assigned a rank equal to the average of their positions in the ascending order of the values. The Spearman’s correlation coefficient is: Where, di = xi – yi (i=1,2,3 ….n) Procedure: For a sample observation the Spearman rank correlation coefficient is: 2 6  di 2  m(m 2  1) 6  di and when ties occur, rs  1  rs  1  n (n 2  1) n (n 2  1)   Here, di= xi-yi , xi=Rank of 1st variable, yi= Rank of 2nd variable m= No. of ties in any group. Following steps are applicable for finding rank correlation Step-1. Rank all observations I. II. Ranking should be made from highest to lowest of the observations. If any two or more of the observations are same in magnitude then all of them must carry the same rank (average of ranks). Department of Agricultural Statistics, OUAT Page-42 UG Practical Manual on Statistics Step-2. When a common rank is assumed for different observations of a m(m 2  1) factor then is added to the numerator of the 2nd term of the 12 formula for the correlation coefficient. Step-3. The sum of differences of the rank should be equal to zero, which is a check for the correction of the calculation. Problem-19. Find the Rank correlation between the following data Preference Price (x) 73.2 85.8 78.9 75.8 77.2 81.2 83.8 Debenture Price (y) 92.8 99.2 98.8 98.3 98.3 96.7 97.1 Determine the relationship between preference share price & debenture price? Solution: Table-18. Calculation of rank correlation coefficient Preference share price (x) 73.2 85.8 78.9 75.8 77.2 81.2 83.8 Rank x (xi) 7 1 4 6 5 3 2 Debenture Price (y) 97.8 99.2 98.8 98.3 98.3 96.7 97.1 Rank y (yi) 5 1 2 3.5 3.5 7 6 di=xi-yi di2 2 0 2 2.5 1.5 -4 -4 4 0 4 6.25 2.25 16 16 2  d i  48.50 Here, y has 2 identical values (m=2) and n=7. Therefore, rank correlation (rs)   m(m 2  1)  2(2 2  1)  2 6  d i  6 48 . 5     12 12      1  1 n (n 2  1) 7(7 2  1) 6(48.5  0.5)  1  0.125 336 It is concluded that the two prices are poorly related i.e. if one price is increasing the other is not in the same way increasing. Exercise: In a survey observations on 10 persons were taken on IQ and No. of Hours Spent in TV per week(Y) as below. Compute the rank correlation and study whether increase in IQ of persons invite the hours spent in TV per week. Department of Agricultural Statistics, OUAT Page-43 UG Practical Manual on Statistics Person 1 2 3 4 5 6 7 8 9 10 No. of Hours Spent in TV per week(Y) 106 7 86 0 100 27 101 50 99 28 103 29 97 20 113 12 112 6 110 17 (Hints: Ans. rs = −0.1757) IQ(X) Fitting of regression equations of two variables Y and X In regression analysis, both variables are normally distributed and one of the variables represents cause (independent or explanatory variable) and other is effect (dependent or response variable). The relationship between two variables can be expressed as a function known as Regression. When only two variables are involved in regression, the functional relationship is known as simple regression. If the relationship between the two variables is linear, it is known as simple linear regression. For simple linear regression, two regression equations are given by: Y on X : Y  Y  b yx (X  X ) X on Y : X  Y  b xy (Y  Y ) Where, byx  regression coefficient of Y on X b xy  regression coefficient of X on Y Y  Y n X  X n , n  No. of observations Procedure: Fitting of regression equations are carried out in two phases. a). Calculation of regression coefficients (bYX and bXY) i). Direct method: Step-1. Construct a table to find out X2, Y2, XY Step-2. Compute X, Y, X2, Y2, XY, Y and X from the table. Step-3. Calculate the regression coefficients by the formula: Department of Agricultural Statistics, OUAT Page-44 UG Practical Manual on Statistics n  XY   X  Y n  X 2  ( X ) 2 n  XY   X  Y bxy  n  Y 2  ( Y ) 2 byx  ii). Step deviation method: Step-1. Reduce the value of X & Y to U & V Where A & B are arbitrary values and h & k are suitable scales and YB XA U V h k Step-2. Construct the table to compute U2, V2, UV Step-3. Compute  U ,  V ,  UV ,  U 2 ,  V 2 from the table Step-4. Compute regression coefficients by the formula: n  UV   U  V n V 2  (V ) 2 n  UV   U  V Re gression coefficient of V on U , bVU  n U 2  (U ) 2 Where n  no. of pairs of observations k h Step-5. Compute bXY = b UV & bYX = bVU k h b). Finding the regression equations Re gression coefficient of U on V , bUV  After estimating the values of X, Y , bYX and bXY and putting these values in the following equations the regression equations can be obtained. Y  Y  b yx (X  X ) and X  X  b xy (Y  Y ) Problem-20. The Following data is given monthly Income & Expenditure on food of 10 families. Income (x) Expenditure (y) 120 90 80 150 130 140 110 95 70 105 40 36 40 45 40 44 45 38 50 35 Find the two linear regression equations and correlation coefficient. Solution: XA Let U  h , V YB k Here, A = 110, h= 5 ; B = 40, k =1 Department of Agricultural Statistics, OUAT Page-45 UG Practical Manual on Statistics Table-19. Calculation of sums & sum of squares XA Expenditure U (Y) h 120 40 2 90 36 -4 80 40 -6 150 45 8 130 40 4 140 44 6 110 45 0 95 38 -3 70 50 -8 105 35 -1 Total=1090 413 -2 Here n = 10 Income (X) YB k 0 -4 0 5 0 4 5 -2 10 -5 V 13 UV U2 0 16 0 40 0 24 0 6 -80 5 11 4 16 36 64 16 36 0 9 64 1 246 V2 0 16 0 25 0 16 25 4 100 25 211 Regression coefficient U on V = bUV , n  UV   U  V n  V 2  ( V )2 10  11  (2  13)  10  211  (13) 2 110  26 136    0.07 2110  169 1941  Regression coefficient of V on U = bvu n  UV   U  V n  U 2  ( U) 2 136   0.055 10  246  (2) 2  h 5 (buv ) )  (0.07)  0.35 k 1 k 1 byx  (bvu )  (0.055)  0.011 h 5  Re gression Coefficient of y on x and x on y are 0.011 and 0.35 respectively. So, bxy  Therefore, the two regression equations and correlation coefficient are: i. Y on X : Y- 41.3 = 0.011(X-109) ii. X on Y : X – 109 = 0.35(Y – 41.3) iii. Correlation of X & Y = √(0.011x0.35) = 0.062 Exercise: From the Exercise in correlation data on photosynthetic rate(Y) and light interception(X), find the regression equation of Y on X and estimate Y when X= 0.95. Department of Agricultural Statistics, OUAT Page-46 UG Practical Manual on Statistics II. DESIGN AND ANALYSIS OF EXPERIMENTS 2.1. Basic concepts on design of experiments Planning an experiment to obtain appropriate data with respect to any problem under investigation is known as ‘design of experiment’. It is a complete sequence of steps taken well in time to ensure that appropriate data will be obtained in a way which permits an objective analysis of the data leading to valid inferences with respect to the stated problems. “Design of experiment” comprises the process of planning of experiments, analysing the data /observations and interpretation of the results. The techniques for making inferences is known as the “analysis of variance”. There are three basic principles of the design of experiments: (i) Replication, (ii) Randomization and (iii) Local control. (i).Replication: The replication of treatments by applying them to more than one experimental unit under investigation is known as replication. Replication is necessary in order to get an estimate of the experimental error variation- cause due to uncontrolled factors. Again, replication increases the precision of treatments. Replication of treatments helps in reducing the error in the experiment in addition to providing an estimate of error. (ii).Randomization: Assigning treatments or factors to be tested to experimental units according to definite law of probability is known as Randomization. In the principle of randomization, every experimental unit will have the same chance of receiving any one of the treatments under study. For an objective comparision it is necessary that treatments are allotted randomly to various experimental units. Statistical procedures employed in making inferences about treatments hold good only when the treatments are allotted randomly to various experimental units. (iii).Local control: Though every experiment should provide an estimate of error variation, it is not desirable to have a large experimental error. The reduction of experimental error can be achieved by making use of the fact that adjacent areas in the field are relatively homogeneous than those widely separated. The aim of local control is to reduce the error by suitably modifying the allocation of treatment to the experimental units by previous knowledge. Analysis of variance (ANOVA) Analysis of variance is basically a technique of partitioning the overall variation in the responses observed in an investigation into different assignable sources of variation, some of which are specifiable and others unknown. Further, it helps in testing whether the variation due Department of Agricultural Statistics, OUAT Page-47 UG Practical Manual on Statistics to any particular component is significant as compared to residual variation that can occur among the observational units. Some important definition for experimental designs Treatment: In experimentation, various objects of comparison are known as treatments. In practice, treatments may refer to a physical substance (fertilizers/varieties of crops/animal breed/feeds etc.) or a procedure/condition/methods of cultivation/sowing/housing conditions, etc. which are applied to experimental units for getting response. Experimental Unit: The basic objects on which the experiment is done are known as experimental unit. Model: In statistics, model is generally expressed in terms of symbols, usually as a set of equations consisting of factors and treatments with a random effect. Fixed effect model: A model in which the factors are fixed effects and the error affect is random is called a fixed effect model. A fixed effect model with two factors is written as:  ijk     i   j  e ijk eijk is i.i.d ~ N (0, e ) 2 Random effect model: Models in which factors are random effects and the error affect is random is called random effect model. Mixed effect model: Models in which some factors are fixed and some random with error affect random is called mixed effect model. Hypothesis: Any assumption or statement about the population characteristic is called hypothesis. It may be parametric or nonparametric. Null hypothesis: It is the hypothesis which is tested for possible rejection under the assumption that it is true. Degrees of Freedom: The degrees of freedom correspond to the number of independent deviations or contrasts that are available from the data and can be calculated by deducting from the number of values available to the number of constants that are calculated from the data. Level of significance: This is the probability (under Ho) which leads to the rejection of the null hypothesis (or rejection region). It is generally denoted by the symbol  and is usually be 0.05(or 5%) or 0.01(or 1%). Basic assumptions for analysis of variance: (i) All the effects of different sources of variation (e.g treatment, environment etc.) are additive. (ii) Experimental errors are independent. (iii) Experimental errors have common variance. (iv) Experimental errors are normally distributed or asymptotic i.e, i.i.d~N (o,e2) Department of Agricultural Statistics, OUAT Page-48 UG Practical Manual on Statistics Analysis of variance of one-way classified data Let there be n observation yij, which are grouped into t classes/treatments such that in the i-th group there are ni observations i.e. i=1,2,3….t; j=1,2,3,…,ni and n i n i and yij is response due to i-th treatment of j-th unit Layout: Treatments .. i .. t 1 2 y11 y21 yi1 y t1 y12 y22 yi2 y t2 y2j yij ytj y1n1 y2n2 yini ytnt T1 T2 Ti Tt .. y1j .. Total Mean Model: Grand total=G Grand mean= yij    ti  eij where,  is a constant representing the general conditions to which all the observations are subjected; ti is the unknown effect of the i-th class to be estimated and eij’ are independent random variables with zero mean 2 and constant variance,  e . Hypothesis: Under certain additional assumptions, analysis of variance leads to testing the following hypotheses, and for at least one i and j Analysis: Step-1. Compute Correction Factor CF= (G 2 n) Step-2. Compute Total Sum of Square, TSS=  yij  CF 2 i, j Department of Agricultural Statistics, OUAT Page-49 UG Practical Manual on Statistics Step-3. Compute Treatment Sum of Square, TrSS= ( Ti 2 i ni )  CF Step-4. Compute Error Sum of Square, ESS=TSS - TrSS Step-5. Prepare ANOVA Table Sources of variation d.f. SS Treatments t-1 TrSS Error n-t ESS Total n-1 TSS MSS TrSS TMS  t 1 ESS EMS  nt Fcal TMS EMS F (tab) Step-6. Compare F values as: If Fcal ≤ Ftab at α level then H0 is accepted i.e. all treatment effects are same or not significant. If Fcal > Ftab at α level then H1 is accepted i.e. at least two treatment effects are different or significant. Step-7. If in ANOVA, the test is not significant which means all the treatments are equal in giving the effect, then stop further analysis as result is concluded. But, if the test is significant means at least two treatments are different for giving the effect, then proceed for comparing the difference of treatment effects by Critical Difference (CD) or Least Significant Difference (LSD) test. CD Test: i). Estimate SE of i-th treatment mean, SE (m)  EMS / ni ii). Estimate SE of the difference between i-th and j-th treatment mean, 1 1 SE (d )  EMS    n n  j   i 2  EMS r iii). Compute CD = SE(d) x t, t=Tabulated t with error d.f. at α level If ni= nj = r, then SE (d) = iv). Compare the difference of any two treatment means (DTM) with the CD value to find the significant difference between treatments. If any DTM is less than or equal to CD, then the two are not significant otherwise significantly different. All such treatment pairs are compared likewise. Step-8. In order to find out the reliability of the experiment, the coefficient of variation (CV) is computed as: EMS CV   100 Overall mean Department of Agricultural Statistics, OUAT Page-50 UG Practical Manual on Statistics If the CV is 20% or less, it is an indication of better precision of the experiment and when the CV is more than 20% the experiment may be repeated and efforts made to reduce the experimental error. Analysis of variance of two-way classified data Two-way ANOVA is carried out when there are two-way variability of factors. For example, treatment as first factor and blocking as second factor in agricultural experiments; feed and housing condition in poultry; learning process and education standard in social science; tree species and agro-climatic condition, etc. Let yij be the responses due to i=1,2,3….t treatments and j=1,2,3,…r blocks in a trial, then Layout: Let there be t treatments with r blocks or replications for studying the response of a characteristic, y Replication Treatment t1 t2 .. tt Total r1 r2 .. rr Total Mean Y11 Y21 .. Yt1 R1 Y12 Y22 .. Yt2 R2 .. Y1r T1 .. Y2r T2 .. .. .. Ytr Tt .. Rr G T1/r T2/r .. Tt/r M=G/rt Model: The model for two way classified data with one observation per cell: yij    ti  b j  eij Hypothesis: Under certain additional assumptions, analysis of variance leads to testing the following hypotheses, and for at least one i and j Analysis: Step-1. Compute Correction Factor CF= (G 2 rt ) Step-2. Compute Total Sum of Square, TSS=  yij  CF 2 i, j 2 Step-3. Compute Treatment Sum of Square, TrSS= ( Ti Step-4. Compute Replication Sum of Square, RSS= ( Rj r i j )  CF 2 t )  CF Step-5. Compute Error Sum of Square, ESS=TSS – TrSS - RSS Step-6. Prepare ANOVA Table Sources of variation d.f. Department of Agricultural Statistics, OUAT SS MS Fcal F (tab) Page-51 UG Practical Manual on Statistics Replication r-1 RSS RSS r 1 TrSS t-1 TrSS TMS  t 1 ESS (r-1)(t-1) ESS EMS  nt rt-1 TSS Treatments Error Total RMS  RMS EMS TMS EMS Step-7. Compare F values as: If Fcal ≤ Ftab at α level then H0 is accepted i.e. all treatment effects are same or not significant. If Fcal > Ftab at α level then H1 is accepted i.e. at least two treatment effects are different or significant. Step-8. If in ANOVA, the test is not significant means all the treatments are equal in giving the effect, then stop further analysis as result is concluded. But, if the test is significant means at least two treatments are different for giving the effect, then proceed for comparing the difference of treatment effects by Critical Difference (CD) or Least Significant Difference (LSD) test as above. Step-9. SE of mean, SE (m)  EMS / r and SE (diff of 2 means), SE (d )  2 EMS / r Step-10. CV  EMS  100 M 2.2. Analysis of data in completely randomized design (CRD) The simplest design using only two essential principles of field experimentation, viz. replication and randomization, is the completely randomised design (CRD). This is a one-way classification of data. In this design whole of the experimental units is divided into no. of experimental units depending on the no. of treatments and no. of replication for each treatment. The treatments are then allotted randomly to the units of the entire homogeneous material and observations on different characteristics or variables of interest are recorded. This design is useful for laboratory or green house experiments where treatment is the only variable of interest for comparison. Procedure: The analysis is same as that of one-way classification with model, assumptions, hypothesis and steps of calculation. Model, Yij =  +ti +eij Where, Yij is the value of the variate in the jth replicate of the ith treatment (i=1,2….t; j=1,2…..ri) Department of Agricultural Statistics, OUAT Page-52 UG Practical Manual on Statistics  = is the general mean effect ti is the effect due to ith treatment eij is random error which is iid ~ N (0, e2) Step-1.The observations of a variable y recorded can be arrived as follows: Arrangement of observation of CRD Treatment 1 2 3 Y11 Y21 Y31 Y12 Y22 Y32 Y13 Y23 Y33 ------Total T1 T2 T3 No. of Repl. r1 r2 r3 Treat mean T1  T1 / r1 T2 T2 /r2 T3 T3 /r3 ……… ……… ……… ……… ------------- T Yt1 Yt2 Yt3 ---Tt rt GT n Tt  Tt / rt Step-2. The testing of hypothesis is, and Step-3. Analysis of data for at least one i and j (GT) 2 i). Correction Factor (C.F.) = n ii). Total Sum of Squares (TSS) =   Y 2ij  C.F. = (Y 211  Y 212  ....  Y 2 tr )  C.F. 2 T 2 T 2 T  iii). Treatment Sum of Squares (TrSS) =  1  2  ...  t   C.F. r2 rt   r1 iv). Error Sum of Squares (ESS) = TSS – TrSS Step-4. Preparation of ANOVA table Sources of variation d.f. SS Treatments t-1 TrSS Error n-t ESS Total n-1 TSS MSS TrSS TMS  t 1 ESS EMS  nt Fcal TMS EMS F (tab) Step-5. If the calculated value of F is greater than the table value of F ; t  1, n  t , where α denotes the level of significance, the hypothesis, Ho, is rejected and it can be inferred that some or all the treatment effects are significantly different. Department of Agricultural Statistics, OUAT Page-53 UG Practical Manual on Statistics Step-6. Calculation of standard errors and CD value for pair comparison: (a).Estimated SE of ith treatment mean, SE (m)  EMS / ri (b).Estimated SE of the difference between i-th and j-th treatment mean is 1 1 SE (d )  EMS   r r  j   i If ri= rj = r, then SE (d) = 2  EMS r (c). CD = SE(d) x t (d).The treatment means are arranged according to their ranks in descending order. Using the CD value the bar chart is completed to interpret the treatment comparisons. CRD with unequal replications Problem-21. A varietal trial on green gram was conducted in a green house under CRD having five varieties V1, V2, V3, V4, V5 and replicated with 3, 4, 5, 4 and 4, respectively. The data recorded on grain yield are presented below. Grain yield of green gram (kg/pot) Varieties V1 V2 V3 V4 V5 1.6 2.5 1.3 2.0 1.6 1.2 2.2 0.9 1.5 1.0 1.5 2.4 0.8 1.6 0.8 -1.9 1.1 1.4 0.9 --1.0 --Total 4.3 9.0 5.1 6.5 4.3 Repl 3 4 5 4 4 Mean 1.43 2.25 1.02 1.62 1.08 Variance 0.043 0.070 0.037 0.069 0.129 Analyse the data and find the best variety of highest grain yield. Solution: Step-1. Null hypothesis Ho: T1=T2 = T2…….= T5 means all varieties give the same yield; H1:T1  T2  ….  T5 means all the varieties does not give the same yield Step-2. Calculation i). C.F.= (29.2)2/ 20 = 42.6320 ii). TSS=[(1.6) 2+(1.2)2+……….+ (0.9) 2] – C.F. =47.840 – 42.632=5.208 Department of Agricultural Statistics, OUAT Page-54 UG Practical Manual on Statistics iii). SS due to treatments (varieties) =TrSS or VSS  (4.3) 2 (9.0) 2 (5.1) 2 (6.5) 2 (4.3) 2         C.F. 4 5 4 4   3  46.8003  42.6320  4.1683 iv). ESS = TSS - VSS = 5.2080 – 4.1683 = 1.0397 Step-3. Construction of ANOVA table Sources of variation d.f. Variety 4 Error 15 Total 19 ** Significant SS MSS Fcal F0.01 4.1683 1.4021 15.037** 4.893 1.0397 0.0693 5.2080 at 1% level Step-4. Since the observed value F is greater than 1% tabulated F value, the null hypothesis rejected. It indicates some of the treatment pairs are different. So, the C.D. test is required for pair wise comparison. Step-5. Calculation of SE for V1 and V2 1 1 1 1 EMS    0.0693    0.040423  0.2011 3 4  r1 r2  The table value of t for  = 0.05 and 15 df is 2.131 Hence, CD= (2.131)  (0.2011) = 0.4285 Similarly CD value of other pairs are: V1 and V3 = 0.4096, V1 and V4; V1 and V5 = 0.4285 V2 and V3; V3 and V4; V3 and V5 = 0.3763 V2 and V4; V2 and V5 = 0.3966. SE(d)= Comparison of the difference between the mean yields of the varieties with the corresponding CD value will result in the following bar chart. V2 V4 V1 V5 V3 Conclusion: It is concluded that the variety V2 is the best variety in giving highest grain yield followed by V1 & V4 and V3 & V5. Exercise: The data from a laboratory experiment is used in which observations were made on mycelial growth of different Rizoctonia solani isolates on PDA medium as: R. solani isolates Mycelial growth Repl. 1 Repl. 2 Repl. 3 RS-1 29.0 28.0 29.0 RS-2 33.5 31.5 29.0 Department of Agricultural Statistics, OUAT Page-55 UG Practical Manual on Statistics RS-3 26.5 30.0 ---- RS-4 48.5 46.5 49.0 RS-5 34.5 31.0 ---- Analyse the data and draw conclusions on significant difference of different Rizoctonia solani. CRD with equal replications Problem-22. In order to find out the yielding abilities of five varieties of sesamum, an experiment was conducted in a poly house using a CRD with four plots per varieties. The observations are given in the table below. Seed yield of sesamum (g/plot) Varieties 1 2 3 4 5 25 25 24 20 14 21 28 24 17 15 21 24 16 16 13 18 25 21 19 11 Total 85 102 85 72 53 Mean 21.2 25.5 21.2 18.0 13.2 Analyse the data and draw conclusions on varietal performance of different sesamum varieties. Solution: Step-1. Null hypothesis Ho: V1 = V2 …. = V5, H1: at least 2 varieties are different. Step-2. Calculation (i). C.F. = 397 2  7880.25 20 (ii). TSS = [(25.)2 + (21)2 +….. (11)2] – C.F. = 8307 – 7880.45 = 426.55 1 (iii). Varieties SS= VSS = (85 2  102 2  ......532 )  C.F. 4 = 8211.75- CF = 8211.75 – 7880.45 = 331.30 (iv). ESS = 426.55 – 331.30 = 95.25 Step-3. Construction of ANOVA table Sources of variation d.f. SS MSS Fcal Ftab Varieties 4 331.30 82.825 13.043 ** 4.893 Error Total 15 95.25 60350 19 426.55 ** Significant at 1% level. Department of Agricultural Statistics, OUAT Page-56 UG Practical Manual on Statistics Step-4. Since the observed value of F is greater than the 5% tabule value, the null hypothesis rejected. So, we proceed for CD test. SE(d) = 26.350  1.7819 4 The table value of t for  = 0.05 and 15 df is 2.131 Hence, CD = (2.131)  (1.7819) = 3.7972 = 3.80 The arrangement of treatments according to their ranks and the bar chart will be: = V2 V1 V3 V4 V5 Conclusion: From the analysis, it is concluded that the variety V2 is the best. Exercise: The data represent a set of observations on wood density obtained on a randomly collected set of 7 stems belonging to five cane species. Species 1 2 3 4 1 0.58 0.53 0.49 0.53 2 0.54 0.63 0.55 0.61 3 0.38 0.68 0.58 0.53 4 0.32 0.55 0.54 0.47 5 0.52 0.45 0.41 0.41 6 0.41 0.59 0.63 0.58 7 0.47 0.65 0.58 0.44 Analyse the data and draw conclusion on difference of 5 0.57 0.64 0.63 0.68 0.61 0.74 0.71 cane species. 2.3. Analysis of data in randomised complete block design (RCBD or RBD) with one observation per cell In order to control variability in one direction in the experimental material it is desirable to divide the experimental unit into homogenous group of units called blocks perpendicular to treatments. The treatments are randomly allocated to each of these blocks. This procedure gives an arrangement of ‘t’ treatments in ‘r’ blocks such that each treatment occurs precisely once in each block. Procedure: Department of Agricultural Statistics, OUAT Page-57 UG Practical Manual on Statistics The analysis of a Randomised Complete Block Design is the one similar to analysis of a two-way classified data. For analysis of this design we use the linear additive model, Yij =   t i  r j  eij Where,  = the overall mean; ti = the ith treatment effect rj=the jth replication effect, and eij = the error term iid~ N (0.e2) Step-1. The observations from a RBD can be arranged as follows: Arrangement of data in RBD with t treatments and r replications Treatment 1 2 3 .……….. t Total Replication Total 1 2 3 …………. r Y11 Y12 Y13 .………… Y1r T1 Y21 Y22 Y23 ..………. Y2r T2 Y31 Y32 Y33 .……….. Y3r T3 .……….. .……….. .……….. .……….. .……….. .……….. Yt1 Yt2 Yt3 .……….. Ytr Tt R1 R2 R3 ..……….. Rt GT Step-2. The data can be analysed as: (i). C.F. = (GT)2/rt (ii). Total SS=TSS =  yij2 – C.F. I 2 (iii). Replication SS= RSS=  R j  C.F. t I 2 (iv). Treatment SS= TrSS =  Ti  C.F. r (v). Error SS=ESS = TSS – RSS – TrSS Step-3. We are interested in testing the hypothesis Ho: t1 = t2 =. ………= tt, against the alternative that at least 2 t’s are not equal. Step-4. ANOVA table F(tab) Sources of variation d.f SS MSS Fcal Replication r-1 RSS RMS RMS / EMS Treatment t-1 TrSS TMS TMS /EMS Error (r - 1)(t-1) ESS EMS Total rt-1 TSS Step-5. If F-test shows that there is no significant difference between replications, it indicates that RBD will not contribute to precision in detecting treatment differences. In such situations the adoption of RBD in preference to CRD is not advantageous. Department of Agricultural Statistics, OUAT Page-58 UG Practical Manual on Statistics Step-6. If by F-test we find significant difference between treatments, then we can use CD for comparing pairs of treatments. The CD is given by: CD = tα x SE(d) Where, tα = table value of t for α (0.01 or 0.05) level of significance and error degrees of freedom. 2EMS And SE(d) = r Based on the CD value the bar chart can be drawn and conclusions can be written. Problem-23. Plan and yield of six paddy strains (A,B,C,D,E,F) yield (kg/plot) in a RBD experiment with four replications is shown below. Block-I Block-II Block-III Block-IV A (12) B (4) B (7) F (8) E (14) C (6) C (9) A (18) C (11) E (11) D (9) C (10) D (7) A (16) E (15) E (6) B (5) D (8) F (12) D (8) F (10) F (9) A (14) E (12) (Parentheses figures are yield observations) Analyse the data and draw conclusions on paddy strains for yield performance. Solution: Step-1. Null hypothesis H0 : TA = TB= ….= TF (All the varieties have the same mean yield); H1 : At least 2 strains are different Step-2. The data can be arranged in the following two-way classification. Paddy yield (in kg/plot) Replication or Blocks Treatment Treatment Total Mean I II III IV A 12 16 14 18 60 15 B 5 4 7 6 22 5.5 C 11 6 9 10 36 9 D 7 8 9 8 32 8 E 14 11 15 12 52 13 F 10 9 12 8 39 9.8 Rep. Total 59 54 66 62 GT=241 Step-3. Calculation here, N=r x t = 4 x 6=24 (GT) 2 (241) 2 (i). Correction factor, CF =   2420 N 24 (ii). Total SS=TSS= (122+……+ 82) – CF= 2717-2420=297 Department of Agricultural Statistics, OUAT Page-59 UG Practical Manual on Statistics (iii). Replication or Block SS=RSS = 60 2   ............  62 2  CF  2432  2420  12 6 (602  ....  392 ) (iv). Variety SS=VSS =  CF  2657  2420  237 9 (v). Error SS=ESS= TSS – RSS – VSS = 297-12-237=48 Step-4. Construction of ANOVA Table Sources of variation Block Variety Error Total d.f. SS MSS Ftab 5% 1% Fcal (r-1)=3 12 4 1.25ns (t-1) =5 237 47 14.8** 2.90 4.56 15 48 3.2 (rt-1)=23 297 NS- Not significant ** Significant at 1% level Step-5. Since the calculated F value of variety is greater than the F table value for 5 and 15 d.f at 1% level, the conclusion is that the varieties differ significantly at 1% level or the varietal differences are highly significant. 2  EMS 3.2  t0.05 for 15 d . f .   2.131  2.69 4 2 Step-7. The arrangement of treatments according to their ranks with respect to their mean and their bar chart is as follows: Varieties: A E F C D B Conclusion: The Bar chart shows that varieties (A & E) are superior to B & (C, D,F); while (C,D,F) are at par with respect to yield performance of these 6 paddy strains. Step-6. Critical difference, CD = Exercise: In a field experiment laid out under RCBD, data is made on seven provenances of Gmelina arborea for the girth at breast-height (gbh) of the trees attained since 6 years of planting. gbh (cm) of trees in plots 6 years after planting Treatment (Provenance) Replication I II III 1 30.85 38.01 35.10 2 30.24 28.43 35.93 3 30.94 31.64 34.95 4 29.89 29.12 36.75 Department of Agricultural Statistics, OUAT Page-60 UG Practical Manual on Statistics 5 21.52 24.07 20.76 6 25.38 32.14 32.19 7 22.89 19.66 26.92 Analyse the data and draw conclusions on treatment differences. 2.4. Analysis of data in Latin square design (LSD) This design controls heterogeneity in two directions in the experimental material. In this design two restrictions are imposed by forming blocks in two perpendicular directions, row wise and column wise. Treatments are allotted in such a way that every treatment occur once and only once in each row and each column. Thus, a Latin square of ‘t’ treatments is an arrangement of t x t or t2 cells such that every row or every column contains every treatment precisely once. By this arrangement the error variation can be considerably reduced further. Procedure: For analysis of these designs we use the linear additive model y ijk    ri  c j  t k  e ijk Where, yijk is the observation on kth treatment in the ith row and jth column (i= 1,2,…………..,s, j=1,2,…………,s; k= 1,2,………,s)  is the general mean effect, ri is the effect due to ith row, cj is the effect due to jth column, tk is the effect due to kth treatment and eijk is the random error component which is assumed to be independently and identically normal distribution with mean zero and a constant variance, 2 e . Analysis: Let, there be s treatments arranged in s rows and s columns, then compute, (i). Ri= Total of ith row =  y ijk j (ii). Cj= Total of jth column = y ijk i (iii). TK= Total of kth treatment in the design (iv). C.F.= (GT) 2 s 2 , where GT is Grand Total (v). TSS (Total Sum of Squares) =  y i (vi). RSS (Row Sum of Squares) = 2 ijk  C.F. j R 2 i s  C.F. i Department of Agricultural Statistics, OUAT Page-61 UG Practical Manual on Statistics (vii). CSS (Column Sum of Squares) = C 2 j s  C.F. j (viii). TrSS (Treatment Sum Squares) = 2   Tk s  C.F. k (ix). ESS (Error Sum of Squares) = TSS- RSS- CSS - TrSS (x). Hypothesis Ho:t1=t2=……………= ts against H1 that ti’s are not equal (xi). ANOVA Table Sources d.f. SS MSS Fcal 2 Row (s-1) RSS Sr = RSS/ s-1 Column (s-1) CSS Sc2 = CSS/s-1 Treatment (s-1) TrSS St2 = TrSS/s-1 St2/se2 Error (s-1) (s-2) ESS Se2=ESS/(s-1) (s-2) Total (s2-1) TSS If the calculated value of F for treatment is greater than the table of F:(s-1);(s-1)(s-2) d.f., the hypothesis Ho is rejected. We can infer that the treatment effects are significantly different. To detect the difference, CD test is performed. The estimated SE of the difference between ith and jth treatment is 2Se 2 SE (d )  s The critical difference (CD) can be calculated as CD= SE(d) x t at error df The degrees of freedom for t are those as for error. The treatment means are computed as Tk/s (k=1,2,………,s). These means can be compared with the help of CD value. Any two treatments means are said to differ significantly if their difference is larger than the CD value. Problem-24. An experiment was carried out on Sorghum with 5 varieties (A,B,C,D & E) in a (5  5) LSD. The Plan and grain yield (kg/plot) are given below: Rows I II III IV V Column total I B (6) A (9) C (3) E (10) D (8) 36 Columns II III A (11) E (8) D (9) C (4) B (8) D (7) C (5) A (10) E (15) B (9) 48 38 IV D (6) E (14) A (12) B (7) C (3) V C (5) B (10) E (8) D (10) A (18) 42 51 Row total 36 46 38 42 53 215 (Parentheses figures are yield observations of respective treatments) Perform the ANOVA and compare the variety mean yields. Department of Agricultural Statistics, OUAT Page-62 UG Practical Manual on Statistics Solution: Step-1. Hypothesis: H0 : A  B  C  D  E H1 :  A  .....................   E Step-2. Yield (kg/plot) of varieties and their totals A B C D E 11 6 5 6 8 9 10 4 9 14 12 8 3 7 8 10 7 5 10 10 18 9 3 8 15 Tk 60 40 20 40 55 Variety totals are: A=60, B=40; C=20; D=40; E= 55 Step-3. Calculation (i). Grand total, GT = 215, Total no. of observations=N=25 (ii). No. of varieties, s = 5 (GT) 2 (215) 2 (iii). Correction factor, C.F. =   1849 N 25 (iv). Total Sum of Squares=TSS 2 2 2 2  y ijk  C.F.  (6  11  ..........  18 )  1849 i = j  2163  1849  314 (v). Row Sum of 2 2 2 2 2 2 R (36  46  38  42  53  1849 i si  C.F.  5  1885.8  1849  36.8 (vi). Column Sum of Squares=CSS = 2 Cj (36 2  48 2  38 2  42 2  512 )  1849 j s  C.F.  5 Squares=RSS =  1881.8  1849  32.8 (vii). Variety Sum of Squares=VrSS 2 Tk (60 2  40 2  20 2  40 2  55 2 )  C . F .  .  1849 =  s 5 k  2045  1849  196 (viii). Error SS=ESS= TSS- RSS – CSS- VrSS =314-36.8-32.8-196 = 48.4 Step-4. Construction of ANOVA Table Source of variation df SS Department of Agricultural Statistics, OUAT MSS Fcal Ftab Page-63 UG Practical Manual on Statistics 5% Rows Columns Variety Error Total 1% 4 36.8 9.2 (9.2/4.03)=2.28 ns 4 32.8 8.2 (8.2/4.03)=2.03 ns 4 196.0 49.0 (49/4.03)=12.15 ** 3.26 5.41 12 48.4 4.03 24 314 Step-5. Comparing the F ratio for Rows, Columns and Varieties with the table value of F (for 4 and 12 d.f) it is found that only difference in varietal means are highly significant. Step-6. CD at 5% = SE(d) x t0.05 for 12 d.f 2  4.03  2.18  1.26  2.18  2.74 = 5 The arrangement of variety means according to their ranks and the bar chart will be done by comparing the differences with CD value. Variety Means and the bar coding is: A E 12 11 B 8 D 8 C 4 AE BD C Conclusion: The analysis reveals that the varietal differences is present and variety A & E are at par; variety B & D are also at par but C is completely different in giving the yield of the crop. Variety A & E are the best varieties for yield performance. Exercise: In a varietal trial on paddy to test the yielding ability of 5 varieties (A,B,C,D,E), an experiment was laid out in a 5x5 LSD. The results are given below. Grain yield of paddy (kg/plot) D 39.0 E 21.2 C 35.6 A 30.8 B 44.3 Analyse the data and varieties. A 24.1 E 26.1 B 37.0 C 42.2 B 38.1 A 24.0 C 39.3 D 33.1 E 33.5 B 38.1 D 40.8 A 24.2 C 31.1 D 46.7 E 28.7 B 44.9 D 29.6 C 41.1 A 26.3 E 24.4 draw conclusion on yielding ability of paddy 2.5. Missing plot technique in design of Experiments Statistical concept: In agricultural field experiments, the experimenter is often encountered with the situation that the observations of a particular plot may be lost or are so much affected by some extraneous causes that it would not be desirable to regard these observations as normal experimental observations. Such data are generally analysed Department of Agricultural Statistics, OUAT Page-64 UG Practical Manual on Statistics through missing plot technique. Statistical analysis of such type of designs where observation on one or more plot are missing is somewhat complicated due to disturbance in the initially symmetrical distribution of plot among different treatments and also among different blocks. The analysis of such experiments, however, can be carried out by one of the following methods. (a) (b) (c) (d) Estimating the missing value(s) using the Principle of least squares i.e. minimizing the error sum of squares. Method of interaction Method of fitting constants, and Analysis of the data with missing observation by the technique of analysis of covariance. In the following, we shall use the first method of analysis of data with one missing observation. 2.6. Analysis of data in RCBD with one missing observation Procedure: When any one observation of a character under study is missing, we first estimate the missing observation and substitute the estimated value in that place and proceed for analysis. The method consists of selecting a value ‘x’ for the unknown missing value such that the error variance is kept at minimum. Consider a randomized block design with t treatments and r replications and one observation is missing. Let, x be the value of the missing observation and this is estimated as:  rB '  tT '  G ' x (r  1)(t  1) where, B’ = total of available values of the replication that contains the missing value T’ = total of available values of the treatment that contains the missing value G’ = grand total of all the available values The analysis is than carried out as usual after substituting the estimated value of the missing value with the following changes. i). The d.f. for error and total is corrected by subtracting 1 from the actual d.f. ii). Treatment Sum of Squares is to be corrected by subtracting the bias, ( B '  tT '  G ' ) 2 B= t (t  1)(r  1) 2 Department of Agricultural Statistics, OUAT Page-65 UG Practical Manual on Statistics iii). Standard error for testing the significance of the difference between treatment means: (a).Standard error of the difference between two treatment means not involving the missing value: 2Se 2 r Where, Se2 is the Error Mean Square SE(d) = (b).Standard error of the difference between two treatment means one of which involves the missing value: SE(d) =  EMS  t  2   r  (r  1)(t  1)  Problem-25. To find out the best source of nitrogen at 60 kg/ha, an experiment was conducted on paddy with 5 sources of nitrogen in 4 blocks. The yield data for different treatments are given below. Yield of grain (kg/plot) Blocks I II III IV Ammonium Sulphate S1 25.4 17.3 22.4 30.5 Ammonium Chloride S2 32.5 -28.4 33.4 Urea S3 37.5 25.4 30.1 34.5 Chilean nitrate S4 22.5 14.7 23.5 22.4 Ammonium Sulphate Nitrate S5 20.5 21.5 23.5 28.5 The observation relating to application of Ammonium Chloride in the second block is missing. Estimate the missing value and analyse the data. Solution: Step-1. Prepare the following two-way table between treatments and blocks treating the yield corresponding to S2 in second block as missing. Treatment X Block table Treatments Blocks 1 2 3 4 5 I 25.4 32.5 37.5 22.5 20.5 II 17.3 -25.4 14.7 21.5 III 22.4 28.4 30.1 23.5 23.5 IV 30.5 33.4 34.5 22.4 28.5 Total 95.6 94.3 127.5 83.1 94.0 Total 138.4 78.9 127.9 149.3 494.5 Step-2. Estimate missing value, x, Department of Agricultural Statistics, OUAT Page-66 UG Practical Manual on Statistics  x (4  78.9)  (5  94.3)  494.5 r  B'  t  T '  G'   24.4 (r  1)(t  1) 3 4 Step-3. Insert the estimated missing value and carryout the analysis of variance according to the usual procedure of RBD except for subtracting 1 d.f from the d.f. for total S.S as well as from the d.f. for error S.S. Step-4. Calculation of sum of squares C.F. = (GT) 2 (518.9) 2   13462.86 rt 20 Total S.S=TSS=  y 2 ij  C.F.  14124.21  13462.86  658.35 B2 i Block S.S =BSS=   C.F 13694.87  13462.86  232.01 t i Treatment S.S.=TrSS =  j Tj r 2  C.F.  13806.73  13462.86  343.87 Error S.S. =ESS= TSS – BSS – TrSS = 658.35 – 232.01 – 343.87 = 82.47 While the error mean square is an unbiased estimate of the error variance, the treatment S.S. is an over estimate and has to be corrected by subtracting from it a bias, B B= ( B '  tT `  G ' ) 2 t (t  1)(r  1) 2 (78.9  5  94.3  494.5) 2   17.36 5 4 9 Corrected Treatment S.S. = 343.87 – 17.36 = 326.51 Step-5. ANOVA Table Sources d.f. SS MSS F Blocks 3 232.01 77.34 10.31 ** Treatments 4 343.87 85.97 11.46 ** Error 11 82.47 7.50 -Total 18 658.35 --Treatments 4 326.51 81.63 8.99 ** (Corrected) Error 11 99.83 9.03 ** Significances at 1% level. Step-6. Calculation Standard Error (a). Standard error of the difference between two treatment means not involving the missing value: Department of Agricultural Statistics, OUAT Page-67 UG Practical Manual on Statistics 2  7.50 2Se 2  SE(d) =  1.936kg / plot r 4 (b). Standard error of the difference between the two treatment means one of which has a missing value: SE(d) = EMS r   t 2  r  1t  1   2.13 kg / plot   Exercise: In an experiment under RCBD for comparing fodder yield of 7 sorghum varieties, the data was obtained as: Fodder yield (t/ha) Variety Replication I II III V1 14.5 14.0 14.0 V2 16.5 16.9 16.7 V3 x 16.7 17.4 V4 17.6 16.9 17.5 V5 18.5 17.9 17.6 V6 19.3 18.3 18.8 V7 19.5 19.0 20.0 Here data on V3 in R-I is missing. Analyse the data and draw your conclusion. 2.7. Analysis of data in LSD with one missing observation Procedure: Step-1. Estimate the missing value, x, t (R '  C '  T ' )  2G` x ( t  1)( t  2)  where, t = no. of treatments R’ = total of available values of the row containing the missing value C’ = total of available values of the column containing the missing value T’ = total of available values of the treatment containing the missing value G’ = grand total of all available values Step-2. The estimated missing value, x, is then inserted and the analysis is carried out according to the usual procedure for LSD, except, for Department of Agricultural Statistics, OUAT Page-68 UG Practical Manual on Statistics subtracting 1 d.f. from the d.f. for total S.S. and error S.S. and computing the corrected treatment S.S. by adjusting the bias, B as (G '  R '  C '  (t  1)T ' ) 2 ((t  1)(t  2))2 Step-3. Standard Error for testing the significance of difference between two treatment means will be done as follows: a. SE of the difference between two treatment means not involving the missing value, 2Se 2 SE (d )  t where, Se2 is the error mean square. b. SE of the difference between two treatment means one of which has a missing value, B= 2  1  SE (d)  Se 2    t ( t  1)( t  2)  Problem-26. The data of grain yield of paddy from a varietal trail in 5 x 5 latin square design is shown in the following table. The yield of variety C is missing from second row. Grain yield of paddy (kg/ plot) E C D B A Total 26 42 39 37 24 168 A D E C B 24 33 21 x 38 166 D B A E C 47 45 31 29 31 183 B A C D E 38 24 36 41 34 173 C E B A D 41 24 44 26 30 165 TOTAL= 176 168 171 133 157 805 Analyse the data and draw your conclusion. Solution: Step-1. We first estimate the missing value, x as  t (R '  C'  T ' )  2G ' 5(116  133  150)  2(805) 385 X    32 ( t  1)( t  2) (5  1)(5  2) 12 Step-2. On substitution of the estimated value in the missing place, we get the corrected totals as follows: Total of second row = 148; Total of 4th column= 165 Department of Agricultural Statistics, OUAT Page-69 UG Practical Manual on Statistics Total of treatment C = 185; Grand total = 837 Step-3. Calculate the various sum of squares as normal LSD: CF= (GT)2/t2 = 28022.76 Total SS =TSS= 29399.00 - CF= 1376.24 Row SS =RSS= 28154.20 - CF= 131.44 Column SS=CSS= 28063.00 - CF= 40.24 Treatment SS=TrSS= 28925.00 – CF = 902.24 Error SS=ESS=TSS - RSS - CSS – TrSS = 302.32 Step-4. Upward ' ' ' ' 2 (G  R  C  ( t  1)T ) [805  116  113  4(150)]2    13.44 [( t  1)( t  2)]2 (4  3) 2 Corrected treatment SS=TrSS(Adj.) = 902.24-13.44 = 888.80 bias,B Step-5. Construction of ANOVA Table: ANOVA Table Sources of variation d.f. SS MS F Row 4 131.44 32.86 1.196 Column 4 40.24 10.06 <1 Treatment(Adj.) 4 888.80 222.20 8.085 Error 11 302.32 27.4836 Total 23 1362.80 Step-6. Estimation of Standard errors (SE): a. SE of the difference between two treatment means not involving the missing value 2Se 2 2  27.4836   3.3156 t 5 CD  (2.201)  (3.3156)  7.2956 SE (d )  b. SE of the difference between two treatment means one of which involves the missing value: 2  2  1 1 SE (d )  Se 2    27.4836     13.2839  3.6447  5 5  15  2   t t  1)t  2  CD  (2.201)  (3.6447)  8.0220 Step-7. Arrange the variety means in descending order of value and prepare the bar chart as: B D C E A Department of Agricultural Statistics, OUAT Page-70 UG Practical Manual on Statistics Conclusion: For yield performance, variety B,D & C are at par and best followed by both E & A. Exercise: Estimate the missing value in the following LSD layout having 4 treatments A,B,C & D and analyse the data to draw conclusion. A 12 C 19 B 10 D 8 C 18 B 12 D 6 A -- B 22 D 10 A 5 C 21 D 12 A 7 C 27 B 17 III. SAMPLING TECHNIQUES Essentially, sampling consists of obtaining information from only a part of a large group or population so as to infer about the whole population. The object of sampling is thus to secure a sample which will represent the population and reproduce the important characteristics of the population under study as closely as possible. The principal advantages of sampling as compared to complete enumeration of the population are reduced cost, greater speed, greater scope and improved accuracy. The smaller size of the sample makes the supervision more effective. Moreover, it is important to note that the precision of the estimates obtained from certain types of samples can be estimated from the sample itself. The most ‘convenient’ method of sampling is that in which the investigator selects a number of sampling units which he considers ‘representative’ of the whole population When sampling is performed so that every unit in the population has some chance of being selected in the sample and the probability of selection of every unit is known, the method of sampling is called probability sampling. An example of probability sampling is random selection, which should be clearly distinguished from haphazard selection, which implies a strict process of selection equivalent to that of drawing lots. In this manual, any reference to sampling, unless otherwise stated, will relate to some form of probability sampling. The probability that any sampling unit will be selected in the sample depends on the sampling procedure used. The important point to note is that the precision and reliability of the estimates obtained from a sample can be evaluated only Department of Agricultural Statistics, OUAT Page-71 UG Practical Manual on Statistics for a probability sample. The object of designing a sample survey is to minimise the error in the final estimates. Even if the sample is a probability sample, the sample being based on observations on a part of the population cannot, in general, exactly represent the population. The average magnitude of the sampling errors of most of the probability samples can be estimated from the data collected. The magnitude of the sampling errors depends on the size of the sample, the variability within the population and the sampling method adopted. Thus, if a probability sample is used, it is possible to predetermine the size of the sample needed to obtain desired and specified degree of precision. A sampling scheme is determined by the size of sampling units, number of sampling units to be used, the distribution of the sampling units over the entire area to be sampled, the type and method of measurement in the selected units and the statistical procedures for analysing the survey data. A variety of sampling methods and estimating techniques developed to meet the varying demands of the survey statistician accord the user a wide selection for specific situations. One can choose the method or combination of methods that will yield a desired degree of precision at minimum cost. 3.1. Principal steps in a sample survey In any sample survey, we must first decide on the type of data to be collected and determine how adequate the results should be. Secondly, we must formulate the sampling plan for each of the characters for which data are to be collected. We must also know how to combine the sampling procedures for the various characters so that no duplication of field work occurs. Thirdly, the field work must be efficiently organised with adequate provision for supervising the work of the field staff. Lastly, the analysis of the data collected should be carried out using appropriate statistical techniques and the report should be drafted giving full details of the basic assumptions made, the sampling plan and the results of the statistical analysis. (i) Specification of the objectives of the survey: Careful consideration must be given at the outset to the purposes for which the survey is to be undertaken. The characteristics on which information is to be collected and the degree of detail to be attempted should be fixed. If it is a survey of trees, it must be decided as to what species of trees are to be enumerated, whether only estimation of the number of trees under specified diameter classes or, in addition, whether the volume of trees is also proposed to be estimated. It must also be decided at the outset what accuracy is desired for the estimates. (ii) Construction of a frame of units: The first requirement of probability sample of any nature is the establishment of a frame. A frame is a list of Department of Agricultural Statistics, OUAT Page-72 UG Practical Manual on Statistics sampling units which may be unambiguously defined and identified in the population. The sampling units may be compartments, topographical sections, strips of a fixed width or plots of a definite shape and size. The sampling frame is collected from secondary sources like revenue department or any related offices or books, journals or records etc. (iii) Choice of a sampling design: If it is agreed that the sampling design should be such that it should provide a statistically meaningful measure of the precision of the final estimates, then the sample should be a probability sample, in that every unit in the population should have a known probability of being selected in the sample. The choice of units to be enumerated from the frame of units should be based on some objective rule which leaves nothing to the opinion of the field worker. The determination of the number of units to be included in the sample and the method of selection is also governed by the allowable cost of the survey and the accuracy in the final estimates. (iv) Organisation of the field work: The entire success of a sampling survey depends on the reliability of the field work. Proper selection of the personnel, intensive training, clear instructions and proper supervision of the fieldwork are essential to obtain satisfactory results. The field parties should correctly locate the selected units and record the necessary measurements according to the specific instruction given. The supervising staff should check a part of their work in the field and satisfy that the survey carried out in its entirety as planned. (v) Analysis of the data: Depending on the sampling design used and the information collected, proper formulae should be used in obtaining the estimates and the precision of the estimates should be computed. Double check of the computations is desired to safeguard accuracy in the analysis. (vi) Preliminary survey (pilot trials): The design of a sampling scheme for a survey requires both knowledge of the statistical theory and experience with data regarding the nature of the study area, the pattern of variability and operational cost. If prior knowledge in these matters is not available, a statistically planned small scale ‘pilot survey’ may have to be conducted before undertaking any large scale survey in that area. Such exploratory or pilot surveys will provide adequate knowledge regarding the variability of the material and will afford opportunities to test and improve field procedures, train field workers and study the operational efficiency of a design. A pilot survey will also provide data for estimating the various components of cost of operations in a survey like time of travel, time of location and enumeration of sampling units, etc. The above information will be of great help in deciding the proper type of design and intensity of sampling that will be appropriate for achieving the objects of the survey. Department of Agricultural Statistics, OUAT Page-73 UG Practical Manual on Statistics Sampling terminology Population : The word population is defined as the aggregate of units from which a sample is chosen. Exa. All the plots, trees, plants, insects, blocks, villages, or people etc. of study area. Sampling units: Sampling units are all the well defined units of the population from which a sample is to be collected. Sampling frame: A list of sampling units of a population of units. Sample: One or more sampling units selected from a population according to some specified procedure constitute a sample. Sampling intensity or sampling fraction: It is the ratio of the number of units in the sample to the number of units in the population (n/N). Population total: Suppose a finite population consists of units U1, U2, …, UN. Let the value of the characteristic for the i-th unit be denoted by yi for every unit Ui. The population total of the values, yi ( i = 1, 2, …, N) is: Population mean: The arithmetic mean or average of yi values Population variance: A measure of the variation between units of the population is: which measures the variation among the population units- large values indicate large variation between units and small values indicate that the values of the characteristic for the units are close to the population mean. The square root of the variance is known as standard deviation. Coefficient of variation: The ratio of the standard deviation to the mean expressed in percentage is: C.V .  Sy Y  100 Department of Agricultural Statistics, OUAT Page-74 UG Practical Manual on Statistics It being unitless used to compare the variation between two or more populations or sets of observations for variability. Parameter: A function of the values of the units in the population. Exa. Population mean, variance, C.V., correlation etc., are population parameters. The problem in sampling theory is to estimate the parameters from a sample by a procedure that makes it possible to measure the precision of the estimates. Estimator and estimate: Let the sample observations be y1, y2, …, yn of size n . Any function of the sample observations will be called a statistic. When a statistic is used to estimate a population parameter, the statistic will be called an estimator. Exa. the sample mean is an estimator of the population mean. Any particular value of an estimator computed from an observed sample will be called an estimate. Bias in estimation: A statistic t is said to be an unbiased estimator of a population parameter q if its expected value, E(t), is equal to q . A sampling procedure based on a probability scheme gives rise to a number of possible samples by repetition of the sampling procedure. If the values of the statistic t are computed for each of the possible samples and if the average of the values is equal to the population value q , then t is said to be an unbiased estimator of q. In case E(t) is not equal to q , the statistic t is said to be a biased estimator of q and the bias is given by, bias = E(t) - q . Sampling variance: It is defined as the average magnitude over all possible samples of the squares of deviations of the estimator from its expected value and is given by V(t) = E[t - E(t)]2. The larger the sample and the smaller the variability between units in the population, the smaller will be the sampling error and the greater will be the confidence in the results. Standard error of an estimator: The square root of the sampling variance of an estimator is known as the standard error of the estimator. The standard error of an estimate divided by the value of the estimate is called relative standard error which is usually expressed in percentage. Accuracy and precision: Precision of an estimate is the inverse of the standard error or the sampling variance. Accuracy usually refers to the size of the deviations of the sample estimate from the mean and the bias thus measured by m - q. It is the accuracy of the sample estimate in which we are chiefly interested and it is the precision with which we are able to measure in most instances. We strive to design the survey and attempt to analyse the data using appropriate statistical methods in such Department of Agricultural Statistics, OUAT Page-75 UG Practical Manual on Statistics a way that the precision is increased to the maximum and bias is reduced to the minimum. Confidence limits: If the estimator t is normally distributed (generally valid for large samples), a confidence interval defined by a lower and upper limit can be expected to include the population parameter q with a specified probability level. The limits are given by Lower limit = t  Z V (t ) Upper limit = t  Z V (t ) Where V(t) is the estimate of the variance of t and Zα is the value of the normal deviate corresponding to a desired α% confidence probability. When Zα is taken as 1.96, we say that the chance of the true value of q being contained in the random interval is 95 per cent. Some general remarks: Capital letters will be used to denote population values and small letters to denote sample values. The symbol ‘cap’ (^) above a symbol for a population value denotes its estimate based on sample observations. While describing the different sampling methods, the formulae for estimating only population mean and its sampling variance are given. Two related parameters are population total and ratio of the character under study (y) to some auxiliary variable (x). These related statistics can always be obtained from the mean by using the following general relations. where = Estimate of the population total N = Total number of units in the population = Estimate of the population ratio X = Population total of the auxiliary variable 3.2. Simple random sampling (SRS) A sampling procedure such that each possible combination of sampling units out of the population has the same chance of being selected is referred to as simple random sampling. From theoretical considerations, simple random sampling is the simplest form of sampling and is the basis for many other sampling methods. Simple random sampling is most applicable for the initial survey in an investigation and for studies which involve sampling from a small area where the sample size is relatively small. The irregular distribution of the sampling units in an area in simple random sampling may be of great disadvantage where Department of Agricultural Statistics, OUAT Page-76 UG Practical Manual on Statistics accessibility is poor and the costs of travel and locating the plots are considerably higher than the cost of enumerating the plot. Selection of sampling units from a Population In practice, a random sample is selected unit by unit. Two methods of random selection for simple random sampling without replacement (WOR) are explained in this section. i). Lottery method: The units in the population are numbered 1 to N and then N identical paper chits with numberings 1 to N are obtained and one chit is chosen at random after shuffling the chits. The process is repeated n times without replacing the chits selected. The units which correspond to the numbers on the chosen chits form a simple random sample of size n from the population of N units. In this way, the probability of selecting any chit is the same for all the N chits. ii). Selection based on random number tables: The procedure of selection using the lottery method obviously becomes rather inconvenient when N is large. To overcome this difficulty, we may use a table of random numbers such as those published by Fisher and Yates a sample of which is given in Appendix. The tables of random numbers have been developed in such a way that the digits 0 to 9 appear independent of each other and approximately equal number of times in the table. The simplest way of selecting a random sample of required size consists in selecting a set of n random numbers one by one from 1 to N in the random number table and then taking the units bearing those numbers. This procedure may involve a number of rejections since all the numbers more than N appearing in the table are not considered for selection. In such cases, the procedure is modified as follows. If N is a d digited number, we first determine the highest d digited multiple of N, say N’. Then a random number r is chosen from 1 to N’ and the unit having the serial number equal to the remainder obtained on dividing r by N is considered as selected. If remainder is zero, the last unit is selected. Problem-27: Select a simple random sample of n=5 units from a population of size N=40. Solution: i). Serially number the population units from 1 to 40 (here 40 is 2-digit). ii). Find the highest two digit number 80 which is divisible by 40. iii). Let us select the 5th column of random number table (Table-5 of Appendix). iv). The value 39 (which is less than N=40) is selected as 1st member of the sample. v).Other values of column 92, 90 ate rejected as >80. Department of Agricultural Statistics, OUAT Page-77 UG Practical Manual on Statistics vi). 27 is selected (which is in 1-40) as 2nd sample unit. vii). 00 i.e 40th value selected as 3rd sample unit. vii). The next value is 74. Dividing it by 40 the remainder is 34. So 34 th unit as 4th sample unit. viii). Next comes 07 and it is selected as 5th sample unit. So, the selected 5 sample units from the population members of 40 are:39, 27, 40, 34 & 7. Exercise: Select a random sample of 11 cows from a list 112 milching cows of a herd by using the random number table. 3.3. Parameter estimation in SRS a). SRS WOR (without replacement) Let y1, y2,… ,yn be the measurements on a particular characteristic on n selected units in a sample from a population of N sampling units. It can be shown in the case of simple random sampling without replacement that the sample mean, is an unbiased estimator of the population mean, of the sampling variance of is given by, . An unbiased estimate where, Assuming that the estimate is normally distributed, a confidence interval on the population mean can be set with the lower and upper confidence limits defined by, Lower limit and Upper limit where z is the table value which depends on how many observations there are in the sample. If there are 30 or more observations we can read the values from the table of the normal distribution. If there are less than 30 observations, the table value should be read from the table of t distribution using n - 1 degree of freedom. Department of Agricultural Statistics, OUAT Page-78 UG Practical Manual on Statistics b). SRS WR (with replacement) Let y1, y2,… ,yn be the measurements on a particular characteristic on n selected units in a sample from a population of N sampling units with replacement. Then, 1. Estimate of population mean, 2. Estimate of Variance of sample mean,  V  (Y )  N 1 2 Sy Nn where  3. Estimate of population total, Y  N  y 4. Estimate of C.I. of population mean:  N 1 Nn  N 1 Nn Lower limit, YL  y  Z S y Upper limit, YL  y  Z S y Problem-28: A forest has been divided up into 1000 plots of 0.1 hectare each and a simple random sample of 25 plots has been selected. For each of these sample plots the wood volumes in m3 were recorded as: Samle Obs. Wood Volume Samle Obs. Wood Volume 1 7 14 11 2 8 15 8 3 2 16 4 4 6 17 7 5 7 18 7 6 10 19 8 7 8 20 7 8 6 21 7 9 7 22 5 10 3 23 8 11 7 24 8 12 13 8 9 25 7 Estimate the population mean, 95% C.I. of mean, C.V. and total volume of wood in the forest by SRSWOR and SRSWR. Compare the efficiency of the two methods. Solution: Department of Agricultural Statistics, OUAT Page-79 UG Practical Manual on Statistics a). SRSWOR Let the ith sampling unit (i=1,2,3,……,25) of wood volume is designated as yi. Now, an unbiased estimator of the population mean is obtained using formula as: = 7 m3 which is the mean wood volume per plot of 0.1 ha in the forest area. An estimate ( formula as: ) of the variance of individual values of y is obtained using = = 3.833 Then unbiased estimate of sampling variance of mean is = 0.1495 and The relative standard error, 0.3867 m3 C.V.= = (100) = 5.52 % The confidence limits on the population mean are : Lower limit = 6.20 Upper limit = 7.80 The 95% confidence interval for the population mean is (6.20, 7.80) m 3. Thus, we are 95% confident that the confidence interval (6.20, 7.80) m3 would include the population mean. An estimate of the total wood volume in the forest area sampled can easily be obtained by multiplying the estimate of the mean by the total number of plots in the population. Thus, with a confidence interval of (6200, 7800) obtained by multiplying the confidence limits on the mean by N = 1000. b). SRSWR An unbiased estimator of the population mean is also obtained using formula as: = 7 m3 which is the mean wood volume per plot of 0.1 ha in the forest area. Department of Agricultural Statistics, OUAT Page-80 UG Practical Manual on Statistics An estimate ( ) of the variance of individual values of y is also obtained using formula as: = = 3.833 Now, the unbiased estimate of sampling variance of mean is   1000  1 V (Y )  1000  25  3.833 =0.153167 and SE(est. of pop. Mean)= Sy N 1 =0.391365 m3 Nn The relative standard error, C.V.=0.3914x100/7=5.59% The confidence limits on the population mean are :  N 1 YL  y  Z S y = 7  2.064  0.3914 =6.19 Nn  YL  y  Z S y N 1 = 7  2.064  0.3914 =7.81 Nn Lower limit = 6.20 Upper limit = 7.80 The 95% confidence interval for the population mean is (6.19, 7.81) m3. Thus, we are 95% confident that the confidence interval (6.19, 7.81) m3 would include the population mean. An estimate of the total wood volume in the forest area sampled can easily be obtained by multiplying the estimate of the mean by the total number of plots in the population. Thus, with a confidence interval of (61900, 7810) obtained by multiplying the confidence limits on the mean by N = 1000. The efficiency of SRSWOR w.r.t SRSWR =(0.1495/0.1531)x100=97.58% Exercise: In an agriculture survey the following data has been recorded on holding size of land (in acres) as: Sl. No. 1 2 3 4 5 6 7 Holding Sl. Holding Sl. Holding Size No. Size No. Size 21.04 13 8.29 25 22.13 12.59 14 7.27 26 1.68 20.30 15 1.47 27 49.58 16.16 16 1.12 28 1.68 23.82 17 10.67 29 4.80 1.79 18 5.94 30 12.72 26.91 19 3.15 31 6.31 Department of Agricultural Statistics, OUAT Page-81 UG Practical Manual on Statistics 8 9 10 11 12 7.41 7.68 66.55 141.80 28.12 20 21 22 23 24 4.84 9.07 3.69 14.61 1.10 32 33 34 35 36 14.18 22.19 2.50 25.29 20.99 Q.1. Draw a random sample of size, n=10 from these 36 observations. Q.2. Findout the population parameters on mean, variance, total, C.V., C.I. of mean at 95% confidence by SRSWOR and SRSWR. Q.3. Compare the relative precision of SRSWOR with SRSWR. 3.4. Stratified sampling The basic idea in stratified random sampling is to divide a heterogeneous population into sub-populations, usually known as strata, each of which is internally homogeneous in which case a precise estimate of any stratum mean can be obtained based on a small sample from that stratum and by combining such estimates, a precise estimate for the whole population can be obtained. Stratified sampling provides a better cross section of the population than the procedure of simple random sampling. It may also simplify the organisation of the field work. Geographical proximity is sometimes taken as the basis of stratification. The assumption here is that geographically contiguous areas are often more alike than areas that are far apart. Administrative convenience may also dictate the basis on which the stratification is made. A fairly effective method of stratification is to conduct a quick reconnaissance survey of the area or pool the information already at hand and stratify the area according to some characteristics like land, slope, breed, age, plant types, stand density, site quality etc. If the characteristic under study is known to be correlated with a supplementary variable for which actual data or at least good estimates are available for the units in the population, the stratification may be done using the information on the supplementary variable. For instance, the rainfall estimates obtained at a previous inventory of an area may be used for stratification of the population. In stratified sampling, the variance of the estimator consists of only the ‘within strata’ variation. Thus the larger the number of strata into which a population is divided, the higher, in general, the precision, since it is likely that, in this case, the units within a stratum will be more homogeneous. For estimating the variance within strata, there should be a minimum of 2 units in each stratum. The larger the number of strata the higher will, in general, be the cost of enumeration. So, depending on administrative convenience, cost of the survey and variability of the Department of Agricultural Statistics, OUAT Page-82 UG Practical Manual on Statistics characteristic under study in the area, a decision on the number of strata will have to be arrived at. Allocation and selection of the sample within strata Let the population is divided into k strata of N1, N2 ,…, Nk units respectively, and that a sample of n units is to be drawn from the population. The problem of allocation concerns the choice of the sample sizes in the respective strata, i.e., how many units should be taken from each stratum such that the total sample is n. Other things being equal, a larger sample may be taken from a stratum with a larger variance so that the variance of the estimates of strata means gets reduced. The application of the above principle requires advance estimates of the variation within each stratum. These may be available from a previous survey or may be based on pilot surveys of a restricted nature. Thus if this information is available, the sampling fraction (ni/Ni) in each stratum may be taken proportional to the standard deviation of each stratum. In case the cost per unit of conducting the survey in each stratum is known and is varying from stratum to stratum an efficient method of allocation for minimum cost will be to take large samples from the stratum where sampling is cheaper and variability is higher. To apply this procedure one needs information on variability and cost of observation per unit in the different strata. Where information regarding the relative variances within strata and cost of operations are not available, the allocation in the different strata may be made in proportion to the number of units in them or the total area of each stratum. This method is usually known as ‘proportional allocation’. For the selection of units within strata, in general, any method which is based on a probability selection of units can be adopted. But the selection should be independent in each stratum. If independent random samples are taken from each stratum, the sampling procedure will be known as ‘stratified random sampling’. Other modes of selection of sampling such as systematic sampling can also be adopted within the different strata. Estimation of mean and variance Department of Agricultural Statistics, OUAT Page-83 UG Practical Manual on Statistics We shall assume that the population of N units is first divided into k strata of N1, N2,…,Nk units respectively. These strata are nonoverlapping and together they comprise the whole population, so that N1 + N2 + ….. + Nk = N When the strata have been determined, a sample is drawn from each stratum, the selection being made independently in each stratum. The sample sizes within the strata are denoted by n1, n2, …, nk respectively, so that n1 + n2 +…..+ n3 = n Let ytj (j = 1, 2,…., Nt ; t = 1, 2,..…k) be the value of the characteristic under study for the j-th unit in the t-th stratum. Then, i). the population mean in the t-th stratum is given by The overall population mean is given by The estimate of the population mean, , in this case will be obtained by Where, ii). Estimate of the variance of Department of Agricultural Statistics, OUAT is given by Page-84 UG Practical Manual on Statistics Where, Stratification, if properly done as explained in the previous sections, will usually give lower variance for the estimated population total or mean than a simple random sample of the same size. However, a stratified sample taken without due care and planning may not be better than a simple random sample. Problem-29: A forest area consisting of 69 compartments was divided into three strata containing compartments 1-29, compartments 30-45, and compartments 46-69 and sample size of 10, 5 and 8 compartments respectively were chosen at random from the three strata. The serial numbers of the selected compartments in each stratum are given in column (4) of the following Table. The corresponding observed volume of the particular species in each selected compartment in m3/ha is shown in column (5). Table-20. Estimation of parameters under stratified sampling Stratum number (1) Total number of units in the stratum (Nt) Number of units sampled (nt) Selected sampling unit number (2) (3) (4) (5) (6) 1 18 28 12 20 19 9 6 17 7 5.40 4.87 4.61 3.26 4.96 4.73 4.39 2.34 4.74 2.85 29.16 23.72 21.25 10.63 24.60 22.37 19.27 5.48 22.47 8.12 .. 42.15 187.07 43 42 36 45 39 4.79 4.57 4.89 4.42 3.44 22.94 20.88 23.91 19.54 11.83 .. 22.11 99.10 59 7.41 54.91 I Total 29 10 II Total 16 Department of Agricultural Statistics, OUAT 5 Volume (m3/ha) ( ( ) ) Page-85 UG Practical Manual on Statistics III Total 24 8 50 49 58 54 69 52 47 3.70 5.45 7.01 3.83 5.25 4.50 6.51 13.69 29.70 49.14 14.67 27.56 20.25 42.38 .. 43.66 252.30 Step-1. Compute the following quantities. N = (29 + 16 + 24) = 69 n = (10 + 5 + 8) = 23 : y I = 4.215, y II = 4.422, y III = 5.458 Step-2. Estimation of the population mean Step-3. Estimation of the variance of and Department of Agricultural Statistics, OUAT Page-86 UG Practical Manual on Statistics Step-3. if we ignore the strata and assume that the same sample of size n = 23 formed a simple random sample (WOR) from the population of N = 69, the estimate of the population mean would reduce to Estimate of the variance of the mean is, Where, so that =C.V. The gain in precision due to stratification with SRSWOR is computed by = 121.8 Thus the gain in precision is 21.8%. Exercise: 2000 wheat cultivators’ holdings in a GP were stratified according to their sizes and the results due to stratification is given below. Stratum No. No. of holdings (Ni) Department of Agricultural Statistics, OUAT Mean area per holding ( Yt ) S.D. of area per holding (St) Page-87 UG Practical Manual on Statistics 1 2 3 4 5 6 7 394 461 381 334 169 113 148 5.4 16.3 24.3 34.5 42.1 50.1 63.8 8.3 13.3 15.1 19.8 24.5 26.0 35.2 Estimate: 1. Mean of wheat area of the GP 2. Variance of mean of Wheat area of GP 3. C.V. of area of GP 4. Mean area, variance of mean, and C.V. of GP if considered as SRSWOR 5. Gain in precision of stratification with SRSWOR 3.5. Systematic sampling Systematic sampling employs a simple rule of selecting every k-th unit starting with a number chosen at random from 1 to k (k=N/n) as the random start. Let there be N sampling units in the population numbered 1 to N, then a systematic sample of n units are selected starting with the random start and others with an interval of k (called sampling interval) from it. This type of sampling is often convenient in exercising control over field work. Apart from these operational considerations, the procedure of systematic sampling is observed to provide estimators more efficient than simple random sampling under normal conditions. The property of the systematic sample in spreading the sampling units evenly over the population can be taken advantage of by listing the units so that homogeneous units are put together or such that the values of the characteristic for the units are in ascending or descending order of magnitude i.e. in some order. For example, knowing the fertility trend of the forest area the units (for example strips) may be listed along the fertility trend. Selection of a systematic sample Consider a population of N=48 units. A sample of n=4 units is needed. Here, k =(48/4)=12. If the random number selected from the set of numbers from 1 to 12 is 11, then the units associated with serial numbers 11, 23, 35 and 47 will be selected. This technique will generate k systematic samples with equal probability. In situations where N is not fully divisible by n, k is calculated as the integer nearest to N/n. In this situation, the sample size is not Department of Agricultural Statistics, OUAT Page-88 UG Practical Manual on Statistics necessarily n and in some cases it may be n -1 and generates unequal sample sizes. Parameter estimation The estimate for the population mean per unit is given by the sample mean where n is the number of units in the sample. One-dimensional Systematic sampling In the case of systematic strip surveys or, in general, any one dimensional systematic sampling, an approximation to the standard error may be obtained from the differences between pairs of successive units. If there are n units enumerated in the systematic sample, there will be (n-1) differences. The variance per unit is therefore given by the sum of squares of the differences divided by twice the number of differences. Thus if y1, y2,…,yn are the observed values (say volume) for the n units in the systematic sample and defining the first difference d(yi) as given below, ; (i = 1, 2, …, n -1) the approximate variance per unit is estimated as Problem-30: The following Table gives the observed diameters of 10 trees selected by systematic selection of 1 in 20 trees from a stand containing 195 trees in rows of 15 trees. The first tree was selected as the 8th tree from one of the outside edges of the stand starting from one corner and the remaining trees were selected systematically by taking every 20th tree switching to the nearest tree of the next row after the last tree in any row is encountered. Table21. Tree diameter on a systematic sample of 10 trees from a plot Tree No. DBH(cm), yi 8 14.8 Department of Agricultural Statistics, OUAT First difference, d(yi) Page-89 UG Practical Manual on Statistics 28 12.0 -2.8 48 13.6 +1.6 68 14.2 +0.6 88 11.8 -2.4 108 14.1 +2.3 128 11.6 -2.5 148 9.0 -2.6 168 10.1 +1.1 188 9.5 -0.6 Solution: Average diameter, The nine first differences can be obtained as shown in column (3) of the Table. The error variance of the mean per unit is thus, = 0.202167 k-Independent Systematic sampling of equal sample size A difficulty with systematic sampling is that one systematic sample by itself will not furnish valid assessment of the precision of the estimates. With a view to have valid estimates of the precision, one may resort to partially systematic samples. A theoretically valid method of using the idea of systematic samples and at the same time leading to unbiased estimates of the sampling error is to draw a minimum of two systematic samples with independent random starts. If , , …, are m estimates of the population mean based on m independent systematic samples, the combined estimate for population mean is: The estimate of the variance of Department of Agricultural Statistics, OUAT is given by Page-90 UG Practical Manual on Statistics Notice that the precision increases with the number of independent systematic samples. Problem-31: The data given in the following Table have one systematic sample along with another systematic sample selected with independent random starts. In the second sample, the first tree was selected as the 10th tree. Table-22. Tree diameter on two independent systematic samples of 10 trees from a plot. Sample-1 Sample-2 Tree No. DBH (cm), yi Tree No. DBH (cm), yi 8 14.8 10 13.6 28 12.0 30 10.0 48 13.6 50 14.8 68 14.2 70 14.2 88 11.8 90 13.8 108 14.1 110 14.5 128 11.6 130 12.0 148 9.0 150 10.0 168 10.1 170 10.5 188 9.5 190 8.5 Solution: Here, n=10, k=20 and N=200 The average diameter for the first sample is second sample is obtained as: and the . Combined estimate of population mean ( ) is Department of Agricultural Statistics, OUAT Page-91 UG Practical Manual on Statistics = 12.13 cm The estimate of the variance of mean is obtained as: = 0.0036 = 0.06 cm and C.V.=0.06x100/12.13=0.49% Total= 200x12.13=2426 cm Exercise: Given below are data for 10 systematic samples of size 4 from a population of 40 units. 0 7 18 29 1 2 1 8 18 30 Systematic sample numbers 3 4 5 6 7 2 1 4 5 6 9 10 12 13 15 19 20 21 20 24 31 31 30 32 35 8 7 6 13 37 9 7 16 28 38 10 9 17 29 63 Work out the estimate of population mean, total, variance, C.V. and relative precision of systematic sample with SRSWOR. *****************XXX****************** Department of Agricultural Statistics, OUAT Page-92 UG Practical Manual on Statistics APPENDIX STATISTICAL TABLES (t, F, χ2, r, Z, random number) Table-1(a): Critical values for t-distribution DF 1 2 3 4 5 Probability % 0.01 0.05 DF Probability % 0.01 0.05 DF Probability % 0.01 0.05 63.657 12.706 9.925 4.303 5.841 3.182 4.604 2.776 4.032 2.571 41 42 43 44 45 2.701 2.698 2.695 2.692 2.689 2.020 2.018 2.017 2.016 2.014 81 82 83 84 85 2.637 2.637 2.636 2.635 2.634 1.990 1.989 1.989 1.989 1.988 6 7 8 9 10 3.707 3.499 3.355 3.250 3.169 2.447 2.365 2.306 2.262 2.228 46 47 48 49 50 2.687 2.684 2.682 2.679 2.677 2.013 2.012 2.011 2.010 2.008 86 87 88 89 90 2.634 2.633 2.632 2.632 2.631 1.987 1.987 1.987 1.987 1.987 11 12 13 14 15 3.106 3.055 3.102 2.977 2.947 2.201 2.179 2.160 2.145 2.131 51 52 53 54 55 2.675 2.673 2.671 2.670 2.668 2.007 2.006 2.005 2.004 2.004 91 92 93 94 95 2.630 2.630 2.629 2.629 2.628 1.986 1.986 1.986 1.986 1.986 16 17 18 19 20 2.921 2.898 2.878 2.861 2.845 2.120 2.110 2.101 2.093 2.086 56 57 58 59 60 2.666 2.664 2.663 2.661 2.660 2.003 2.002 2.002 2.001 2.000 96 97 98 99 100 2.628 2.627 2.626 2.626 2.625 1.985 1.985 1.984 1.984 1.984 21 22 23 24 25 2.831 2.819 2.807 2.797 2.787 2.080 2.074 2.069 2.064 2.060 61 62 63 64 65 2.658 2.657 2.656 2.654 2.653 1.999 1.998 1.998 1.997 1.996 105 110 115 120 125 2.623 2.621 2.619 2.617 2.616 1.983 1.982 1.981 1.980 1.979 26 27 28 29 30 2.779 2.771 2.763 2.756 2.750 2.056 2.052 2.048 2.045 2.042 66 67 68 69 70 2.652 2.651 2.650 2.649 2.647 1.996 1.995 1.995 1.994 1.994 130 135 140 145 150 2.614 2.613 2.611 2.610 2.609 1.978 1.978 1.977 1.976 1.976 31 32 33 34 35 2.744 2.738 2.733 2.728 2.723 2.040 2.037 2.035 2.033 2.030 71 72 73 74 75 2.646 2.645 2.644 2.643 2.643 1.993 1.993 1.993 1.993 1.992 160 170 180 190 200 2.607 2.605 2.603 2.602 2.601 1.975 1.974 1.973 1.973 1.972 36 37 38 39 40 2.719 2.715 2.711 2.707 2.704 2.028 2.026 2.024 2.022 2.021 76 77 78 79 80 2.642 2.641 2.640 2.639 2.638 1.992 1.991 1.991 1.991 1.990 250 300 350 400  2.596 2.592 2.590 2.588 2.576 1.969 1.968 1.967 1.966 1.960 Table-1(b): Critical values for t-distribution (One & Two-tailed) Department of Agricultural Statistics, OUAT Page-93 UG Practical Manual on Statistics Percentage (P) One-tailed Two-tailed Degree of freedom (v) 5% 1% 5% 1% 1 6.31 31.8 12.7 63.7 2 2.92 6.96 4.30 9.92 3 2.35 4.54 3.18 5.84 4 2.13 3.75 2.78 4.60 5 2.02 3.36 2.57 4.03 6 1.94 3.14 2.45 3.71 7 1.89 3.00 2.36 3.50 8 1.86 2.90 2.31 3.36 9 1.83 2.82 2.26 3.25 10 1.81 2.76 2.23 3.17 11 1.80 2.72 2.20 3.11 12 1.78 2.68 2.18 3.05 13 1.77 2.65 2.16 3.01 14 1.76 2.62 2.14 2.98 15 1.75 2.60 2.13 2.95 16 1.75 2.58 2.12 2.92 17 1.74 2.57 2.11 2.90 18 1.73 2.55 2.10 2.88 19 1.73 2.44 2.09 2.86 20 1.72 2.53 2.09 2.85 22 1.72 2.51 2.07 2.82 24 1.72 2.49 2.06 2.80 26 1.71 2.48 2.06 2.78 28 1.70 2.47 2.05 2.76 30 1.70 2.46 2.04 2.75 35 1.69 2.44 2.03 2.72 40 1.68 2.42 2.02 2.70 45 1.68 2.41 2.01 2.69 Department of Agricultural Statistics, OUAT Page-94 UG Practical Manual on Statistics 50 1.68 2.40 2.01 2.68 55 1.67 2.40 2.00 2.67 60 1.67 2.39 2.00 2.66 ¥ 1.64 2.33 1.96 2.58 Table-2: Critical values for F-distribution Smaller MS (n2) Degrees of freedom for greater mean square (n1) 1 2 3 4 5 6 7 8 9 10 1 5% 161.00 200.00 216.00 225.00 230.00 234.00 237.00 239.00 241.00 242.00 1% 4052.00 4999.00 5403.00 5625.00 5764.00 5859.00 5928.00 5981.00 6022.00 6056.00 2 5% 1% 18.51 19.00 19.16 19.25 19.30 19.33 19.36 19.37 19.38 19.39 98.49 99.00 99.17 99.25 99.30 99.33 99.36 99.37 99.39 99.40 3 5% 1% 10.13 9.55 9.28 9.12 9.01 8.94 8.88 8.84 8.81 8.78 34.12 30.82 29.46 28.71 28.24 27.91 27.67 27.49 27.34 27.23 4 5% 1% 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96 21.20 18.00 16.69 15.98 15.52 15.21 14.98 14.80 14.66 14.54 5 5% 1% 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.78 4.74 16.26 13.27 12.06 11.39 10.97 10.67 10.45 10.29 10.15 10.05 6 5% 1% 5.99 5.14 13.74 10.92 4.76 9.78 4.53 9.15 4.39 8.75 4.28 8.47 4.21 8.26 4.15 8.10 4.10 7.98 4.06 7.87 7 5% 1% 5.59 12.25 4.74 9.55 4.35 8.45 4.12 7.85 3.97 7.46 3.87 7.19 3.79 7.00 3.73 6.84 3.68 6.71 3.63 6.62 8 5% 1% 5.32 11.26 4.46 8.65 4.07 7.59 3.84 7.01 3.69 6.63 3.58 6.37 3.50 6.19 3.44 6.03 3.39 5.91 3.34 5.82 9 5% 1% 5.12 10.56 4.26 8.02 3.86 6.99 3.63 6.42 3.48 6.06 3.37 5.80 3.29 5.62 3.23 5.47 3.18 5.35 3.13 5.26 10 5% 4.96 10.04 4.10 7.56 3.71 6.55 3.48 5.99 3.33 5.64 3.22 5.39 3.14 5.21 3.07 5.06 3.02 4.95 2.97 4.85 4.84 9.65 3.98 7.20 3.59 6.22 3.36 5.67 3.20 5.32 3.09 5.07 3.01 4.88 2.95 4.74 2.90 4.63 2.80 4.54 4.75 9.33 3.88 6.93 3.49 5.95 3.26 5.41 3.11 5.06 3.00 4.82 2.92 4.65 2.85 4.50 2.80 4.39 2.76 4.30 4.67 9.07 3.80 6.70 3.41 5.74 3.18 5.20 3.02 4.86 2.92 4.62 2.84 4.44 2.77 4.30 2.72 4.19 2.67 4.10 4.60 8.86 3.74 6.51 3.34 5.56 3.11 5.03 2.96 4.69 2.85 4.46 2.77 4.28 2.70 4.14 2.65 4.03 2.60 3.94 4.54 8.68 3.68 6.36 3.29 5.42 3.06 4.89 2.90 4.56 2.79 4.32 2.70 4.14 2.64 4.00 2.59 3.89 2.55 3.80 4.49 8.53 3.83 6.23 3.24 5.29 3.01 4.77 2.85 4.44 2.74 4.20 2.66 4.03 2.59 3.89 2.54 3.78 2.49 3.69 4.45 8.40 3.59 6.11 3.20 5.18 2.96 4.67 2.81 4.34 2.70 4.10 2.62 3.93 2.55 3.79 2.50 3.68 2.45 3.59 1% 11 5% 1% 12 5% 1% 13 5% 1% 14 5% 1% 15 5% 1% 16 5% 1% 17 5% 1% Table-2 (Continued…) Critical values for F-distribution Department of Agricultural Statistics, OUAT Page-95 UG Practical Manual on Statistics Smaller MS Degrees of freedom for greater mean square (n1) (n2) 1 18 5% 4.41 1% 8.28 2 3.55 6.01 3 3.16 5.09 4 2.93 4.58 5 2.77 4.25 6 2.66 4.01 7 2.58 3.85 8 2.51 3.71 9 2.46 3.60 10 2.41 3.51 19 5% 4.38 1% 8.18 3.52 5.93 3.13 5.01 2.90 4.50 2.74 4.17 2.63 3.94 2.55 3.77 2.48 3.63 2.43 3.52 2.38 3.43 20 5% 4.35 1% 8.10 3.49 5.85 3.10 4.94 2.87 4.43 2.71 4.10 2.60 3.87 2.52 3.71 2.45 3.56 2.40 3.45 2.35 3.37 21 5% 4.32 1% 8.02 3.47 5.78 3.07 4.87 2.84 4.37 2.68 4.04 2.57 3.81 2.49 3.65 2.42 3.51 2.37 3.40 2.32 3.31 22 4.30 7.94 3.44 5.72 3.05 4.82 2.82 4.31 2.66 3.99 2.55 3.76 2.47 3.59 2.40 3.45 2.35 3.35 2.30 3.26 23 5% 4.28 1% 7.88 3.42 5.66 3.03 4.76 2.80 4.26 2.64 3.94 2.53 3.71 2.45 3.54 2.38 3.41 2.32 3.30 2.28 3.21 24 4.26 7.82 3.40 5.61 3.01 4.72 2.78 4.22 2.62 3.90 2.51 3.67 2.43 3.50 2.36 3.36 2.30 3.25 2.26 3.17 4.24 7.77 3.38 5.57 2.99 4.68 2.76 4.18 2.60 3.86 2.49 3.63 2.41 3.46 2.34 3.32 2.28 3.21 2.24 3.13 26 5% 4.22 1% 7.72 3.37 5.53 2.98 4.64 2.74 4.14 2.59 3.82 2.47 3.59 2.39 3.42 2.32 3.29 2.28 3.17 2.22 3.09 27 5% 4.21 1% 7.68 3.50 5.49 2.96 4.60 2.73 4.11 2.57 3.79 2.46 3.56 2.37 3.39 2.30 3.26 2.25 3.14 2.20 3.06 28 5% 4.20 1% 7.64 3.34 5.45 2.95 4.57 2.71 4.07 2.56 3.76 2.44 3.53 2.36 3.36 2.29 3.23 2.24 3.11 2.19 3.03 29 5% 4.18 1% 7.60 3.33 5.42 2.95 4.54 2.70 4.04 2.54 3.73 2.43 3.50 2.35 3.33 2.28 3.20 2.22 3.08 2.18 3.00 30 5% 4.17 1% 7.56 3.32 5.39 2.92 4.51 2.69 4.02 2.53 3.70 2.42 3.47 2.34 3.30 2.27 3.17 2.21 3.06 2.16 2.98 31 5% 4.16 1% 7.53 3.31 5.37 2.91 4.49 2.68 4.00 2.52 3.68 2.41 3.45 2.33 3.28 2.26 3.15 2.20 3.04 2.15 2.96 32 5% 4.15 1% 7.50 3.30 5.34 2.90 4.46 2.67 3.97 2.51 3.66 2.40 3.42 2.32 3.25 2.25 3.12 2.19 3.01 2.14 2.94 33 5% 4.14 1% 7.47 3.29 5.32 2.89 4.44 2.66 3.95 2.50 3.64 2.39 3.40 2.31 3.23 2.24 3.10 2.18 2.99 2.13 2.92 34 5% 4.13 1% 7.44 3.28 5.29 2.88 4.42 2.65 3.93 2.49 3.61 2.38 3.38 2.30 3.21 2.23 3.08 2.17 2.97 2.12 2.89 5% 1% 5% 1% 25 5% 1% Table-2 (Continued…) Critical values for F-distribution Smaller MS Degrees of freedom for greater mean square (n1) (n2) 1 35 5% 4.12 1% 7.42 2 3.27 5.27 3 2.87 4.40 4 2.64 3.91 5 2.49 3.60 6 2.37 3.37 7 2.29 3.20 8 2.22 3.06 9 2.16 2.96 10 2.11 2.88 36 5% 4.11 1% 7.39 3.26 5.25 2.86 4.38 2.63 3.89 2.48 3.58 2.36 3.35 2.28 3.18 2.21 3.04 2.15 2.94 2.10 2.86 37 5% 4.11 1% 7.37 3.26 5.23 2.86 4.36 2.63 3.88 2.47 3.56 2.36 3.34 2.27 3.17 2.20 3.03 2.15 2.93 2.10 2.84 38 39 1% 4.10 7.35 3.25 5.21 2.85 4.34 2.62 3.86 2.46 3.54 2.35 3.32 2.26 3.15 2.19 3.02 2.14 2.91 2.09 2.82 5% 4.09 3.24 2.85 2.62 2.46 2.35 2.26 2.19 2.13 2.08 5% Department of Agricultural Statistics, OUAT Page-96 UG Practical Manual on Statistics 1% 7.33 5.20 4.33 3.85 3.53 3.31 3.14 3.01 2.90 2.81 40 5% 4.08 1% 7.31 3.23 5.18 2.84 4.31 2.61 3.83 2.45 3.51 2.34 3.29 2.25 3.12 2.18 2.99 2.12 2.88 2.07 2.80 41 5% 4.08 1% 7.29 3.23 5.17 2.84 4.30 2.61 3.82 2.45 3.50 2.33 3.28 2.25 3.11 2.18 2.98 2.12 2.87 2.07 2.79 42 5% 4.07 1% 7.27 3.22 5.15 2.83 4.29 2.60 3.80 2.44 3.49 2.32 3.26 2.24 3.10 2.17 2.96 2.11 2.86 2.06 2.77 43 5% 4.07 1% 7.26 3.22 5.14 2.83 4.28 2.60 3.79 2.44 3.48 2.32 3.25 2.24 3.09 2.17 2.95 2.11 2.85 2.06 2.76 44 4.06 7.24 3.21 5.12 2.82 4.26 2.59 3.78 2.43 3.46 2.31 3.24 2.23 3.07 2.16 2.94 2.10 2.84 2.05 2.75 4.06 7.23 3.21 5.11 2.82 4.25 2.59 3.77 2.43 3.45 2.31 3.23 2.23 3.06 2.15 2.93 2.10 2.83 2.05 2.74 4.05 7.21 3.20 5.10 2.81 4.24 2.58 3.76 2.42 3.44 2.30 3.22 2.22 3.05 2.14 2.92 2.09 2.82 2.04 2.73 4.05 7.20 3.20 5.09 2.81 4.23 2.58 3.75 2.42 3.43 2.30 3.21 2.22 3.05 2.14 2.91 2.09 2.81 2.04 2.72 4.04 7.19 3.19 5.08 2.80 4.22 2.57 3.74 2.41 3.42 2.30 3.20 2.21 3.04 2.14 2.90 2.08 2.80 2.03 2.71 4.04 7.18 3.19 5.07 2.80 4.21 2.57 3.73 2.41 3.42 2.30 3.19 2.21 3.03 2.14 2.89 2.08 2.79 2.03 2.71 4.03 7.17 3.18 5.06 2.79 4.20 2.56 3.72 2.40 3.41 2.29 3.18 2.20 3.02 2.13 2.88 2.07 2.78 2.02 2.70 5% 1% 45 5% 1% 46 5% 1% 47 5% 1% 48 5% 1% 49 5% 1% 50 5% 1% Table-2 (Continued…) Critical values for F-distribution Smaller MS Degrees of freedom for greater mean square (n1) (n2) 1 55 5% 4.02 1% 7.12 2 3.17 5.01 3 2.78 4.16 4 2.54 3.68 5 2.38 3.37 6 2.27 3.15 7 2.18 2.98 8 2.11 2.85 9 2.05 2.75 10 2.00 2.66 60 5% 1% 4.00 7.08 3.15 4.98 2.76 4.13 2.52 3.65 2.37 3.34 2.25 3.12 2.17 2.95 2.10 2.82 2.04 2.72 1.99 2.63 65 5% 1% 3.99 7.04 3.14 4.95 2.75 4.10 2.51 3.62 2.36 3.31 2.24 3.09 2.15 2.93 2.08 2.79 2.02 2.70 1.98 2.61 70 5% 1% 3.98 7.01 3.13 4.92 2.74 4.08 2.50 3.60 2.35 3.29 2.23 3.07 2.14 2.91 2.07 2.77 2.01 2.67 1.97 2.59 80 5% 1% 3.96 6.96 3.11 4.88 2.72 4.04 2.48 3.56 2.33 3.25 2.21 3.04 2.12 2.87 2.05 2.74 1.99 2.64 1.95 2.55 100 5% 3.94 1% 6.90 3.09 4.82 2.70 3.98 2.46 3.51 2.30 3.20 2.19 2.99 2.10 2.82 2.03 2.69 1.97 2.59 1.92 2.51 125 5% 3.92 1% 6.84 3.07 4.78 2.68 3.94 2.44 3.47 2.29 3.17 2.17 2.95 2.08 2.79 2.01 2.65 1.95 2.56 1.90 2.47 150 5% 3.91 1% 6.81 3.06 4.75 2.67 3.91 2.43 3.44 2.27 3.14 2.16 2.92 2.07 2.76 2.00 2.62 1.94 2.53 1.89 2.44 200 5% 3.89 1% 6.76 3.04 4.71 2.65 3.88 2.41 3.41 2.26 3.11 2.14 2.90 2.05 2.73 1.98 2.60 1.92 2.50 1.87 2.41 400 5% 3.86 1% 6.70 3.02 4.66 2.62 3.83 2.39 3.36 2.23 3.06 2.12 2.85 2.03 2.69 1.96 2.55 1.90 2.46 1.85 2.37 10005% 3.85 1% 6.66 3.00 4.62 2.61 3.80 2.38 3.34 2.22 3.04 2.10 2.82 2.02 2.66 1.95 2.53 1.89 2.43 1.84 2.34 Department of Agricultural Statistics, OUAT Page-97 UG Practical Manual on Statistics  5% 1% 3.84 2.99 2.60 2.37 2.21 2.09 2.01 1.94 1.88 1.83 6.64 4.60 3.78 3.32 3.02 2.80 2.64 2.51 2.41 2.32 Table-2 (Continued…) Critical values for F-distribution Smaller MS (n2) Degrees of freedom for greater mean square (n1) 11 12 13 14 15 16 17 18 19 20 1 5% 243.00 244.00 244.50 245.00 245.50 246.00 246.50 247.00 247.50 248.00 1% 6082.00 6106.00 6124.00 6142.00 6156.00 6169.00 6177.00 6186.00 6194.00 6208.00 2 5% 1% 19.40 19.41 19.42 19.42 19.43 19.43 19.43 19.44 19.44 19.44 99.41 99.42 99.42 99.43 99.43 99.44 99.44 99.45 99.45 99.45 3 5% 1% 8.76 8.74 8.73 8.71 8.70 8.69 8.68 8.68 8.67 8.66 27.13 27.05 26.99 26.92 26.88 26.83 26.80 26.76 26.73 26.69 4 5% 1% 5.93 5.91 5.89 5.87 8.86 5.84 5.83 5.82 5.81 5.80 14.45 14.37 14.31 14.24 14.20 14.15 14.11 14.07 14.04 14.02 5 5% 1% 4.70 9.96 4.68 9.89 4.66 9.81 4.64 9.77 4.62 9.73 4.60 9.68 4.59 9.65 4.58 9.62 4.57 9.58 4.56 9.55 5% 1% 4.03 7.79 4.00 7.72 3.98 7.66 3.96 7.60 3.94 7.56 3.92 7.52 3.91 7.79 3.90 7.46 3.88 7.42 3.87 7.39 7 5% 1% 3.60 6.54 3.57 6.47 3.55 6.41 3.52 6.35 3.51 6.31 3.49 6.27 3.48 6.24 3.47 6.21 3.45 6.18 3.44 6.15 8 5% 1% 3.31 5.74 3.28 5.67 3.26 5.62 3.23 5.56 3.22 5.52 3.20 5.48 3.19 5.46 3.18 5.42 3.16 5.39 3.15 5.36 9 5% 1% 3.10 5.18 3.07 5.11 3.05 5.06 3.02 5.00 3.00 4.96 2.98 4.92 2.97 4.89 2.96 4.86 2.94 4.83 2.93 4.80 10 5% 1% 2.94 4.78 2.91 4.71 2.89 5.66 2.86 4.60 2.84 4.56 2.82 4.52 2.81 4.49 2.80 4.47 2.78 4.44 2.77 4.41 11 5% 1% 2.82 4.46 2.79 4.40 2.77 4.35 2.74 4.29 2.72 4.25 2.70 4.21 2.69 4.18 2.68 4.16 2.66 4.13 2.65 4.10 12 5% 1% 2.72 4.22 2.69 4.16 2.67 4.11 2.64 4.05 2.62 4.02 2.60 3.98 2.59 3.95 2.57 3.92 2.56 3.89 2.54 3.86 13 5% 1% 2.63 4.02 2.60 3.96 2.58 3.92 2.55 3.85 2.53 3.82 2.51 3.78 2.50 3.75 2.49 3.73 2.73 3.70 2.46 3.67 14 5% 1% 2.56 3.86 2.53 3.80 2.51 3.75 2.48 3.70 2.46 3.66 2.44 3.62 2.43 3.59 2.42 3.57 2.40 3.54 2.39 3.51 15 5% 1% 2.51 3.73 2.48 3.67 2.46 3.66 2.43 3.56 2.41 3.52 2.39 3.48 2.38 3.45 2.36 3.42 2.35 3.39 2.33 3.36 16 5% 1% 2.45 3.61 2.42 3.55 2.40 3.50 2.37 3.45 2.35 3.41 2.33 3.37 2.32 3.34 2.31 3.31 2.29 3.28 2.28 3.25 17 5% 1% 2.41 3.52 2.38 3.45 2.36 3.40 2.33 3.35 2.31 3.31 2.29 3.27 2.28 3.24 2.26 3.22 2.25 3.19 2.23 3.16 6 Table-2 (Continued…) Critical values for F-distribution Department of Agricultural Statistics, OUAT Page-98 UG Practical Manual on Statistics Smaller MS Degrees of freedom for greater mean square (n1) (n2) 11 18 5% 2.37 1% 3.44 12 2.34 3.37 13 2.32 3.32 14 2.29 3.27 15 2.27 3.23 16 2.25 3.19 17 2.24 3.16 18 2.22 3.13 19 2.21 3.10 20 2.19 3.07 19 5% 1% 2.34 3.36 2.31 3.30 2.29 3.25 2.26 3.19 2.24 3.16 2.21 3.12 2.20 3.09 2.18 3.06 2.17 3.03 2.15 3.00 20 5% 1% 2.31 3.30 2.28 3.23 2.26 3.18 2.23 3.13 2.21 3.09 2.18 3.05 2.17 3.02 2.15 3.00 2.14 2.97 2.12 2.94 21 5% 1% 2.28 3.24 2.25 3.17 2.23 3.12 2.20 3.07 2.18 3.03 2.15 2.99 2.14 2.96 2.12 2.94 2.12 2.91 2.09 2.88 22 5% 1% 2.26 3.18 2.23 3.12 2.21 3.07 2.18 3.02 2.16 2.98 2.13 2.94 2.12 2.91 2.10 2.89 2.09 2.86 2.07 2.83 23 5% 2.24 1% 3.14 2.20 3.07 2.17 3.02 2.14 2.97 2.12 2.93 2.10 2.89 2.09 2.86 2.07 2.84 2.06 2.81 2.04 2.78 24 5% 1% 2.22 3.09 2.18 3.03 2.16 2.98 2.13 2.93 2.11 2.89 2.09 2.85 2.07 2.82 2.06 2.80 2.04 2.87 2.02 2.74 25 5% 1% 2.20 3.05 2.16 2.99 2.14 2.94 2.11 2.89 2.09 2.85 2.07 2.81 2.05 2.78 2.04 2.76 2.02 2.73 2.00 2.70 26 5% 1% 2.18 3.02 2.15 2.96 2.13 2.91 2.10 2.86 2.08 2.82 2.05 2.77 2.04 2.74 2.02 2.72 2.01 2.69 1.99 2.66 27 5% 1% 2.16 2.98 2.13 2.93 2.11 2.88 2.08 2.83 2.06 2.79 2.03 2.74 2.02 2.71 2.00 2.69 1.99 2.66 1.97 2.63 28 5% 1% 2.15 2.95 2.12 2.90 2.09 2.85 2.06 2.80 2.04 2.76 2.02 2.71 2.01 2.68 1.99 2.66 1.98 2.63 1.96 2.60 29 5% 1% 2.14 2.92 2.10 2.87 2.08 2.82 2.05 2.77 2.03 2.73 2.00 2.68 1.99 2.65 1.97 2.63 1.96 2.60 1.94 2.57 30 5% 1% 2.12 2.90 2.09 2.84 2.05 2.79 2.04 2.74 2.02 2.70 1.99 2.66 1.98 2.63 1.96 2.61 1.95 2.58 1.93 2.55 31 5% 1% 2.11 2.88 2.08 2.82 2.05 2.77 2.03 2.72 2.01 2.68 1.98 2.64 1.97 2.61 1.95 2.59 1.94 2.56 1.92 2.53 32 5% 1% 2.10 2.86 2.07 2.80 2.05 2.75 2.02 2.70 2.00 2.66 1.97 2.62 1.96 2.59 1.94 2.57 1.93 2.54 1.91 2.51 33 5% 2.09 1% 2.84 2.06 2.78 2.04 2.73 2.01 2.68 1.99 2.64 1.96 2.60 1.95 2.57 1.93 2.55 1.92 2.52 1.90 2.49 34 2.05 2.76 2.03 2.71 2.00 2.66 1.98 2.62 1.95 2.58 1.94 2.55 1.92 2.53 1.91 2.50 1.89 2.47 5% 1% 2.08 2.82 Table-2 (Continued…) Critical values for F-distribution Smaller MS Degrees of freedom for greater mean square (n1) (n2) 11 35 5% 2.07 1% 2.80 12 2.04 2.74 13 2.02 2.69 14 1.99 2.64 15 1.97 2.60 16 1.94 2.56 17 1.93 2.53 18 1.91 2.51 19 1.90 2.48 20 1.88 2.45 36 5% 1% 2.06 2.78 2.03 2.72 2.01 2.67 1.98 2.62 1.96 2.58 1.93 2.54 1.92 2.51 1.90 2.49 1.89 2.46 1.87 2.43 37 5% 1% 2.06 2.77 2.03 2.71 2.00 2.66 1.97 2.61 1.95 2.57 1.93 2.53 1.91 2.50 1.89 2.47 1.88 2.44 1.86 2.41 38 5% 1% 2.05 2.75 2.02 2.69 1.99 2.64 1.96 2.59 1.94 2.55 1.92 2.51 1.90 2.48 1.89 2.46 1.87 2.43 1.85 2.40 39 5% 1% 2.05 2.74 2.01 2.68 1.99 2.63 1.96 2.58 1.93 2.54 1.91 2.50 1.89 2.48 1.88 2.45 1.86 2.42 1.85 2.38 Department of Agricultural Statistics, OUAT Page-99 UG Practical Manual on Statistics 40 5% 1% 2.04 2.73 2.00 2.66 1.98 2.61 1.95 2.56 1.93 2.53 1.90 2.49 1.89 2.46 1.87 2.43 1.86 2.40 1.84 2.37 41 5% 1% 2.01 2.72 2.00 2.65 1.98 2.60 1.95 2.55 1.92 2.51 1.90 2.48 1.88 2.45 1.86 2.42 1.85 2.39 1.83 2.36 42 5% 1% 2.02 2.70 1.99 2.64 1.97 2.59 1.94 2.54 1.92 2.50 1.89 2.46 1.87 2.43 1.86 2.41 1.84 2.38 1.82 2.35 43 5% 1% 2.02 2.69 1.99 2.63 1.96 2.58 1.93 2.53 1.91 2.49 1.89 2.45 1.87 2.42 1.85 2.39 1.83 2.36 1.82 2.33 44 5% 1% 2.01 2.68 1.98 2.62 1.95 2.57 1.92 2.52 1.90 2.48 1.88 2.44 1.86 2.41 1.85 2.38 1.83 2.35 1.81 2.32 45 5% 1% 2.01 2.67 1.98 2.61 1.95 2.56 1.92 2.51 1.90 2.47 1.88 2.43 1.86 2.40 1.84 2.37 1.82 2.34 1.81 2.31 46 5% 1% 2.00 2.66 1.97 2.60 1.94 2.55 1.91 2.50 1.89 2.46 1.87 2.42 1.84 2.39 1.84 2.36 1.82 2.33 1.80 2.30 47 5% 2.00 1% 2.65 1.97 2.59 1.94 2.54 1.91 2.51 1.89 2.45 1.87 2.41 1.85 2.38 1.83 2.35 1.81 2.32 1.80 2.29 48 5% 1.99 1% 2.64 1.96 2.58 1.93 2.53 1.90 2.48 1.88 2.44 1.86 2.40 1.85 2.37 1.83 2.34 1.81 2.31 1.79 2.28 49 5% 1% 1.99 2.63 1.96 2.57 1.93 2.52 1.90 2.47 1.88 2.43 1.86 2.40 1.84 2.36 1.82 2.33 1.80 2.30 1.79 2.27 50 5% 1% 1.98 2.62 1.95 2.56 1.92 2.51 1.89 2.46 1.87 2.43 1.85 2.39 1.83 2.36 1.82 2.33 1.80 2.29 1.78 2.26 Table-2 (Continued…) Critical values for F-distribution Smaller MS Degrees of freedom for greater mean square (n1) (n2) 11 55 5% 1.97 1% 2.59 12 1.93 2.53 14 1.88 2.43 16 1.83 2.35 20 1.76 2.23 24 1.72 2.15 30 1.67 2.06 40 1.61 1.96 50 1.58 1.90 75 1.52 1.82 60 5% 1% 1.95 2.56 1.92 2.50 1.86 2.40 1.81 2.32 1.75 2.20 1.70 2.12 1.65 2.03 1.59 1.93 1.56 1.87 1.50 1.79 65 5% 1.94 2.54 1.90 2.47 1.85 2.37 1.80 2.30 1.73 2.18 1.68 2.09 1.63 2.00 1.57 1.90 1.54 1.84 1.49 1.76 1.93 2.51 1.89 2.45 1.84 2.35 1.79 2.28 1.72 2.15 1.67 2.07 1.62 1.98 1.56 1.88 1.53 1.82 1.47 1.74 1.91 2.48 1.88 2.41 1.82 2.32 1.77 2.24 1.70 2.11 1.65 2.03 1.60 1.94 1.54 1.84 1.51 1.78 1.45 1.70 100 5% 1.88 1% 2.43 1.85 2.36 1.79 2.26 1.75 2.19 1.68 2.06 1.63 1.98 1.57 1.89 1.51 1.79 1.48 1.73 1.42 1.64 125 5% 1.86 1% 2.40 1.83 2.33 1.77 2.23 1.72 2.15 1.65 2.03 1.60 1.94 1.55 1.85 1.49 1.75 1.45 1.68 1.39 1.59 150 5% 1.85 1% 2.37 1.82 2.30 1.76 2.20 1.71 2.12 1.64 2.00 1.59 1.91 1.54 1.83 1.47 1.72 1.44 1.66 1.37 1.56 200 5% 1.83 1% 2.34 1.80 2.28 1.74 2.17 1.69 2.09 1.62 1.97 1.57 1.88 1.52 1.79 1.45 1.69 1.42 1.62 1.35 1.53 400 5% 1.81 1% 2.29 1.78 2.23 1.72 2.12 1.67 2.04 1.60 1.92 1.54 1.84 1.49 1.74 1.42 1.64 1.38 1.57 1.32 1.47 1000 1.80 1% 70 5% 1% 80 5% 1% 5% 1%  5% 1.76 1.70 1.65 1.58 1.53 1.47 1.41 1.36 1.30 2.26 2.20 2.09 2.01 1.89 1.81 1.71 1.61 1.54 1.44 1.79 1.75 1.69 1.64 1.57 1.52 1.46 1.40 1.35 1.28 Department of Agricultural Statistics, OUAT Page-100 UG Practical Manual on Statistics 1% 2.24 2.18 2.07 1.99 1.87 1.79 1.69 1.59 1.52 1.41 Table-3: χ2 (Chi-Squared) Distribution: Critical Values of χ2 Table-4: Critical value for Correlation coefficients (Simple or Partial) Probability % 0.01 0.05 DF DF Probability % 0.01 0.05 DF Probability % 0.01 0.05 1 2 3 4 5 1.000 0.990 0.959 0.917 0.874 0.997 0.950 0.878 0.811 0.754 41 42 43 44 45 0.389 0.384 0.380 0.376 0.372 0.301 0.297 0.294 0.291 0.288 130 135 140 145 150 0.223 0.219 0.215 0.212 0.208 0.171 0.168 0.165 0.162 0.159 6 7 8 9 10 0.834 0.798 0.765 0.735 0.708 0.707 0.666 0.632 0.602 0.576 46 47 48 49 50 0.368 0.365 0.361 0.358 0.354 0.285 0.282 0.279 0.276 0.273 160 170 180 190 200 0.202 0.196 0.190 0.185 0.181 0.154 0.150 0.145 0.142 0.138 11 12 13 14 15 0.684 0.661 0.641 0.623 0.606 0.553 0.532 0.514 0.497 0.482 52 54 56 58 60 0.348 0.341 0.336 0.330 0.325 0.268 0.263 0.259 0.254 0.250 250 300 350 400 450 0.162 0.148 0.137 0.128 0.121 0.124 0.113 0.105 0.098 0.092 16 17 18 19 20 0.590 0.575 0.561 0.549 0.537 0.468 0.456 0.444 0.433 0.423 62 64 66 68 70 0.320 0.315 0.310 0.306 0.302 0.246 0.242 0.239 0.235 0.232 500 600 700 800 900 0.115 0.105 0.097 0.091 0.086 0.088 0.080 0.074 0.069 0.065 21 0.526 0.413 72 0.298 0.229 1000 0.081 0.062 Department of Agricultural Statistics, OUAT Page-101 UG Practical Manual on Statistics 22 23 24 25 0.515 0.505 0.496 0.487 0.404 0.396 0.388 0.381 74 76 78 80 0.294 0.290 0.286 0.283 0.226 0.223 0.220 0.217 26 27 28 29 30 0.478 0.470 0.463 0.456 0.449 0.374 0.367 0.361 0.355 0.349 82 84 86 88 90 0.280 0.276 0.273 0.270 0.267 0.215 0.212 0.210 0.207 0.205 31 32 33 34 35 0.442 0.436 0.430 0.424 0.418 0.344 0.339 0.334 0.329 0.325 92 94 96 98 100 0.264 0.262 0.259 0.256 0.254 0.203 0.201 0.199 0.197 0.195 36 37 38 39 40 0.413 0.408 0.403 0.398 0.393 0.320 0.316 0.312 0.308 0.304 105 110 115 120 125 0.248 0.242 0.237 0.232 0.228 0.190 0.186 0.182 0.178 0.174 Table-5: Percentage points of the normal distribution, Z This table gives percentage points of the standard normal distribution. These are the values of z for which a given percentage, P, of the standard normal distribution lies outside the range from -z to +z. P (%) 90 0.1257 70 0.3853 80 60 50 40 30 20 15 10 5 2 1 0.50 Department of Agricultural Statistics, OUAT Z 0.25 0.2533 0.5244 0.6745 0.8416 1.0364 1.2816 1.4395 1.6449 1.9600 2.3263 2.5758 2.8070 3.0233 Page-102 UG Practical Manual on Statistics 0.10 0.01 3.2905 3.8906 Table-6: Random numbers Each digit in the following table is independent and has a probability of (1/10). The table was computed from a population in which the digits 0 to 9 were equally likely. 77 21 24 33 39 07 83 00 02 77 28 11 37 33 77 10 41 31 90 76 35 00 25 78 80 18 77 32 78 85 75 57 59 76 96 63 65 37 58 79 87 96 72 67 25 72 59 21 96 16 02 21 05 19 59 96 90 61 02 16 29 68 92 86 20 61 09 14 93 48 32 85 65 57 14 77 47 73 76 36 65 64 55 43 56 39 60 97 03 78 34 85 49 53 38 89 19 98 98 88 82 80 25 00 59 00 91 03 14 37 43 75 37 56 79 65 92 27 00 74 07 44 74 48 45 80 57 06 74 67 48 73 83 39 34 91 42 11 90 08 64 82 41 25 19 50 97 06 73 63 30 35 08 55 82 54 27 43 71 36 Department of Agricultural Statistics, OUAT 07 70 53 07 38 72 81 26 17 62 78 26 83 64 36 47 60 75 07 50 79 08 13 32 01 22 12 27 28 71 84 11 43 10 39 09 92 97 26 77 66 71 69 14 11 14 50 42 06 21 61 16 12 62 28 26 85 62 58 25 81 55 15 58 52 86 95 58 80 89 09 90 91 08 19 88 99 83 99 36 99 65 96 59 63 96 39 60 58 81 01 12 19 22 95 25 59 59 91 94 11 46 15 67 51 71 14 14 45 40 88 83 88 37 80 76 02 65 27 54 77 48 73 86 30 67 05 73 50 31 04 18 64 41 74 16 44 69 47 91 79 12 93 25 34 54 47 41 77 15 74 55 49 51 55 55 21 56 13 67 31 75 18 53 89 31 98 13 87 35 72 56 29 61 91 15 Page-103 UG Practical Manual on Statistics 64 28 96 90 23 12 98 92 28 94 57 41 99 11 42 86 68 06 36 25 82 26 85 49 76 15 90 13 60 00 26 02 65 28 59 87 94 79 48 98 85 87 54 49 64 95 47 55 75 54 53 43 38 30 80 03 36 62 87 21 77 15 78 57 87 75 71 59 16 96 51 15 61 53 14 36 49 69 97 93 77 32 77 27 15 53 67 34 75 46 51 63 15 39 53 90 35 05 63 32 53 23 30 33 02 31 23 10 37 05 74 59 Department of Agricultural Statistics, OUAT 83 31 23 10 32 06 20 61 08 18 80 86 09 64 42 28 68 82 81 22 17 25 71 51 13 12 32 25 63 38 51 82 10 29 02 92 26 28 60 83 06 33 08 88 98 82 83 23 30 31 06 17 63 70 30 07 01 14 60 48 03 81 32 16 25 65 59 50 91 03 89 97 59 71 97 14 78 44 87 43 75 30 55 08 18 80 02 02 24 20 44 02 48 22 89 25 92 55 53 33 33 39 37 91 79 10 97 06 73 65 33 58 Page-104

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download practical manual on statistics - College of Agriculture, OUAT