Download practical manual on statistics - College of Agriculture, OUAT

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Dr. A. K. Parida, Ph.D.
Associate Professor
Department of Agricultural Statistics
College of Agriculture (OUAT), Bhubaneswar-3
PREFACE
The subject statistics has much importance for teaching,
research and extension in the field of agriculture and allied
science. The knowledge and expertise of the subject is
immensely helpful to the teachers, scientists, students and
research scholars for their area of study and application. We
collect data from different sources by different methods for
different purposes. As these data are random in nature, they are
subjected to various manipulations to infer valid conclusions for
further efficient use and correct decisions. No doubt, we can
handle the voluminous data so generated for the purpose by use
of computers and softwares. But, the fundamental concepts,
knowledge and expertise on procedures, principles and
techniques of statistics play a vital role to arrive at a valid and
meaningful conclusion.
This practical manual has been conceived and prepared for
the students and teachers as well to acquaint the basic concepts
of statistical principles and procedures of calculations as per the
syllabi of 4th Dean’s committee of ICAR for under graduate
courses in agriculture and allied sciences. The manuscript of this
manual has been prepared with my long years of teaching
expertise and persuasion from students and teachers of the
university. The contents so developed have been referred and
copied from many text books, journals, manuals and the
internet. I acknowledge the help of those sources. I expect
comments from the users of this manual for any addition or
deletion and improvement in future. I wish the practical manual
would be very much useful for students and research workers.
I may, also, thank to the authorities for providing funds
from the XIth ICAR development grant for printing the manual.
Date: March 25, 2009
Amulya Kumar Parida
CONTENTS
Practicals
Topics
Page
I.
Statistical methods
1
1.1
Construction of Frequency Table
1
1.2
Graphical
distribution
frequency
4
1.3
Measures of central tendency or central value Arithmetic Mean, Geometric Mean, Harmonic
Mean, Median, Mode, Quartile, Decile and
Percentiles
6
1.4
Measures of dispersion of a frequency
distribution
Mean
deviation,
Standard
Deviation, Variance, and Coefficient of Variation
(C.V.)
13
1.4
Moments and Measure of skewness and kurtosis
17
1.5
Testing of Hypothesis or Test of Significance or
decision rule
20
1.6
Standard normal deviate (SND) or Z tests or
Large Sample Tests - for single mean and
difference of two means
21
1.7
Small Sample Tests - test of 2 variances, test
for single mean, two independent means and
two dependent means
24
1.8
Chi-square test (χ2) - Goodness-of-fit
independence or association of attributes
and
33
1.9
Correlation
and
regression
Pearson’s
correlation coefficient and its test, Spearman's
Rank correlation coefficient; fitting of regression
equations of two variables Y and X
38
II.
DESIGN AND ANALYSIS OF EXPERIMENTS
47
2.1
Basic concepts on design of experiments Analysis of variance : one-way and two-way
classification
47
representation
of
Practicals
Topics
Page
2.2
Analysis of data in completely randomized design
(CRD): unequal replications, equal replications
52
2.3
Analysis of data in randomised complete block
design(RCBD)
57
2.4
Analysis of data in Latin square design (LSD)
61
2.5
Missing plot technique in design of Experiments
64
2.6
Analysis of data in RCBD with one missing
observation
65
2.7
Analysis of data in LSD with one missing
observation
68
III.
SAMPLING TECHNIQUES
71
3.1
Principal steps in a sample survey
72
3.2
Simple random sampling (SRS): Selection of
sampling units from a Population
76
3.3
Parameter estimation in SRS: SRSWOR, SRSWR
78
3.4
Stratified sampling
82
3.5
Systematic sampling
88
APPENDIX
STATISTICAL TABLES (t, F, χ2, r, Z, random
number)
93
Table-1(a): Critical values for t-distribution
93
Table-1(b): Critical values for t-distribution (One
& Two-tailed)
93
Table-2: Critical values for F-distribution
95
Table-3: χ2 (Chi-Squared) Distribution: Critical
Values of χ2
101
Table-4:
Critical
value
for
coefficients (Simple or Partial)
101
Table-5: Percentage
distribution, Z
points
Table-6: Random numbers
of
Correlation
the
normal
102
103
UG Practical Manual on Statistics
PRACTICAL MANUAL ON STATISTICS
Two major practical aspects of scientific investigations are collection
of data and interpretation of the collected data. The data may be
generated through a sample survey on a naturally existing population or a
designed experiment on a hypothetical population. The collected data are
condensed and useful information extracted through techniques of
statistical inference. This manual essentially deals with various statistical
methods and techniques used for objectively tabulating the data, step-by
step computation of data and making valid inferences out of the same
which will be useful for under graduate students.
General Objective: To impart knowledge to the students on basic
concepts and statistical techniques applied in agriculture and allied
sciences.
Specific objectives:
By the end of practical exercises, the students will be able to:
1. Acquaint with the practical applications of statistical techniques in
agriculture.
2. Make self sufficient and to draw valid conclusion of statistical
techniques.
I.
STATISTICAL METHODS
1.1. Construction of frequency table
A frequency table is a technique which meaningfully summarizes a
set of observations in a tabular form so as to bring about the essential
information contained in it. A tabular arrangement of data by classes
together with the corresponding class frequencies is called a frequency
distribution or frequency table.
There are two types of frequency table.
i. Exclusive type
ii.Inclusive type
The frequency table of exclusive type (lower limit value is included
and upper limit is excluded) is formed when the data are continuous and
it is called as continuous distribution. The frequency table of inclusive
type (both lower and upper limit values included) is considered when the
data are discrete or discontinuous and it is called discontinuous / discrete
distribution.
Department of Agricultural Statistics, OUAT
Page-1
UG Practical Manual on Statistics
Procedure:
The following steps are to be considered for constructing a
frequency table from a set of data.
Step-1. Determination of number of classes
Usually the number of classes should be of 5 to 15 otherwise the
information contained in the data may be lost. One may use the formula
of Sturge’s rule for determining the number of classes, K.
K= 1+3.322 log10 N
where, N=No. of observations
Step-2. Determination of magnitude of class interval (CI)
From a given set of observations, locate the maximum (Max) and
minimum (Min) value.
Then, Range= Max – Min
Max  Min
and CI or class width (d) will be: d =
K
If ‘d’ have decimal value then consider the nearest integral value as class
width.
Step-3. Choice of class limits or class boundaries
First, we should check whether the observations of the variable is a
continuous or discrete type viz. height, weight, volume etc. of
measurement type is a continuous variable and no. of trees, no. of
students etc. of count type is discrete variables. Use exclusive method of
frequency distribution if the variable is continuous otherwise inclusive
method if variable is discrete.
Step-4. Formation of classes:
a. Exclusive method: From the first class the subsequent classes are
made by adding d with both lower and upper limits, e.g. if first class is
L to L+d then second class is L+d to L+2d and so on.
Exa. 10 to 15, 15 to 20, 20 to 25 etc.
b. Inclusive method: From the first class the subsequent classes are
made by adding (d+1) instead of d to both lower and upper limits,
e.g. if first class is L to L + d then second class is [L+(d+1)] to
[L+(2d+1)] and so on.
Exa. 10 to 15, 16 to 21, 22 to 27 etc.
Step-5. Determination of Class frequency
It is how frequently a value of the variable occurs in a class. The
class frequencies are determined with the help of tally marks (|).
Step-6. Construction of frequency distribution table
Department of Agricultural Statistics, OUAT
Page-2
UG Practical Manual on Statistics
The frequency table has the following headings.
Classes Tally mark Frequency
(1)
(2)
(3)
The classes are formed starting with the minimum value of the set
of observations having each class of difference of class width(d). Then,
tally marks are made under each class as per the appearance of the
observations sequentially. In a class when 5th tally mark is required,
either a slash(/) or overhead mark(¯) is drawn to the group of 4 tally
marks. The tally mark in each class starts from the first observation till to
the end of data. Then the tally marks are counted as frequency of the
class in the last column.
Problem-1. Construct the frequency distribution table with the following
30 observations.
10(Min),15,17,20,21,16,17,18,20,31,35(Max),13,12,15,14,12,15,17,14,1
3,15,14,13,14,20,19,18,28,24,25.
Solution:
(i). No. of Classes, K = 1 + 3.322 log10N
K= 1+3.322  Log1030
= 1+3.322  1.4771
= 1+4.90=5.90  6.
Max  Min
(ii). Class size, d =
K
35  10 25
d

 4.16  5.
6
6
a. Exclusive method:
where, N = 30
Table-1. Construction of frequency distribution table with CI=5
Class
10-15
15-20
20-25
25-30
30-35
35-40
Total
Tally marks
IIII IIII
IIII IIII I
IIII
II
I
I
Department of Agricultural Statistics, OUAT
Frequency
10
11
5
2
1
1
30
Page-3
UG Practical Manual on Statistics
b. Inclusive method:
Table-2. Construction of frequency distribution table with CI=5
Class
10-14
15-19
20-24
25-29
30-34
35-39
Total
IIII
IIII
IIII
II
I
I
Tally mark
IIII
IIII I
Frequency
10
11
5
2
1
1
30
1.2. Graphical representation of frequency distribution
Graphical representation of the observations facilitate to better
understanding about some more depth of distribution of observations. The
frequency distribution can be represented in the form of Histogram,
Frequency polygon, Frequency curve and Ogive.
Procedure:
a. Histogram: Histogram is a set of vertical bars in a 2-dimensional
graph whose areas are proportional to the frequency of the class. It
can be drawn by taking classes in X-axis and drawing bars of
corresponding class frequencies in the Y-axis.
b. Frequency polygon: It is made by joining straight lines with the mid
points of each bars of the Histogram.
c. Frequency curve: A Frequency curve is a graphical representation of
frequencies corresponding to their variate values by a smooth hand
curve. Frequency curve is made when the CI of each class is small
so as to draw a smooth hand curve. It can be drawn by smooth
hand joining of mid points of frequency polygon.
d. Ogive: It is a graph plotted for the variate values and their
corresponding cumulative frequency of a frequency distribution. Its
shape is just like elongated “S”. An Ogive is prepared by using
‘more than type’ or ‘less than type’ or both of cumulative
frequencies.
The above graphical representation of frequency data is easily made
with exclusive type. If a frequency table is of inclusive type, it is first
made into exclusive type and then the above types of graphs are drawn.
Cumulative frequency is the systematic sum of frequencies of each
class in downward (less than type) and upward (more than type) in the
classes of frequency table.
Department of Agricultural Statistics, OUAT
Page-4
UG Practical Manual on Statistics
Problem–2. Construct the Histogram, Frequency Polygon, Frequency
curve and Ogive of the following frequency distribution on the length of
60 sorghum ear heads (cm).
Class (Length)
No. of ear head
: 18-20 21-23
:
4
10
24-26 27-29 30-32 33-35
14
16
10
4
36-38
2
Solution:
As the given frequency table is of inclusive type, the classes of
exclusive type is to be made for continuity of classes and then the both
type of cumulative frequencies are to be computed.
Table-3. Cumulative frequency table
Class
18-20
21-23
24-26
27-29
30-32
33-35
36-38
Exclusive Class Mid value Frequency
17.5-20.5
20.5-23.5
23.5-26.5
26.5-29.5
29.5-32.5
32.5-35.5
35.5-38.5
19
22
25
28
31
34
37
4
10
14
16
10
4
2
Cumulative Frequency
Less than Greater than
4
60
14
56
28
46
44
32
54
16
58
6
60
2
Fig. 1. HISTOGRAM
Fig. 2. FREQUENCY POLYGON
Fig. 3. FREQUENCY CURVE
Fig. 4. OGIVE(1-less type, 2-more type)
Department of Agricultural Statistics, OUAT
Page-5
UG Practical Manual on Statistics
Exercise: Construct a frequency distribution table, histogram, frequency
polygon, frequency curve and ogive for the following data and interpret
the results.
25, 32, 45, 8, 24, 42, 22, 12, 9, 15, 26, 35, 23, 41, 47, 18, 44, 37, 27,
46, 38, 24, 43,46, 10, 21, 36, 45, 22, 18.
1.3. Measures of central tendency or central value
Central tendency or central value is the property of the distribution
of data where we compute a central value which represents all other
values. It is commonly measured by the Arithmetic Mean (or Mean),
Geometric Mean, Harmonic Mean, Median and Mode.
Procedure:
Mean or Arithmetic Mean (A.M)
The arithmetic mean is the sum of observations divided by the total
number of observations.
i. For a series of data: If the series have ‘n’ values of a variable ‘X’, i.e. x,
x2,………….., x n, the Arithmetic Mean (A.M) is given by:
x1  x2  ........................  xn Sum of values

n
No. of Values
A.M 
n
X 
x
i 1
i
n
ii. For ungrouped frequency distribution:
Suppose
the
values
x 1,
x2…………………..,xn
occur
with
frequencies
n
f1,f2,………………, fn, then A.M. is given by: X 
 f .x
i 1
i
N
i
n
, N   fi
i
iii. For grouped frequency distribution:
If data are grouped according to different class intervals, the mid
value of each class is taken as an approximation to the value of the
variable representing that class. If m1, m2………… …… mn represents the
mid values of ‘n’ classes of the variable ‘X’ and f1, f2,……..,fn represents
the corresponding frequencies, the Arithmetic Mean of x is
Department of Agricultural Statistics, OUAT
Page-6
UG Practical Manual on Statistics
n
X
f m
i
i 1
n
f
i 1
i
i
a). Short-cut method (or change of origin):
If di = (xi - A), A= any arbitrary value(called origin), then
n
X  A
 f .d
i
i 1
f
i
i
i
b). Step-deviation method (or change of origin and scale):
x A
If u i   i
where, A = any arbitrary value(called origin),

 h 
h = magnitude of class interval (or scale), then
h n
X  A   fiu i
N i 1
Geometric Mean (G.M.)
Geometric mean is the ‘n-th’ root of the product of all ‘n’ values.
i. For a series of data: If the values of the variable are x1, x2,…xn, then
the Geometric mean of ‘x’ is:
1/ n
G   x1. x2 ............xn 
Alternatively, log10 G 
1 n
 log10 xi
n x 1
' or '
1 n

G  Anti log  log10 xi 
 n x 1

ii. For ungrouped frequency distribution:
If the values x1, x2………. xn occur with frequencies f1,f2….fn respectively,
then
1
 1 n

f1
f2
fn f 1  f 2  ...........  f n
f i log 10 x i 
G  ( x1 x2 ............xn )
or G  Anti log 

 N x 1

N = f1  f2  ...........  fn
iii. For Grouped frequency distribution:
G  (m 1 m 2
f1
f2
fn
..............m n )
1
f1  f 2 ............. f n
Department of Agricultural Statistics, OUAT
 1
or G  Anti log 
N
n

x 1

f i log 10 m i 

Page-7
UG Practical Manual on Statistics
N=
f1  f2  ...........  fn
and m1, m2……….. mn are mid-values of the classes.
Harmonic mean (H.M.)
The Harmonic Mean is the reciprocal of the mean of reciprocal of the
observations.
i. For a series of data: If x1, x2…….xn are values of a given variable, then
the Harmonic Mean is:
H .M 
1
1 1 1
1
 
 ........  
n  x1 x 2
xn 

n
1
 

i 1  xi 
n
ii. For ungrouped frequency distribution:
If x1, x2,…………,xn occur with the frequencies f1,f2,……..,fn respectively,
then,
H .M 
f
i
f1 f 2
f
 ...........  n
x1 x2
xn
f
( f x )

i
n
i
i
i
iii. For grouped frequency distribution:
HM 
 f , where,m , m
  f m 
i
1
2
,......., mn are mid  values of the classes.
i
i
Problem-3. The frequency distribution of weight(g) of 180 sorghum earheads is given in the following table. Calculate the A.M., G.M and H.M.
Table-4. Frequency distribution of sorghum ear heads
Weight of ear head in gm
(X)
40-60
60-80
80-100
100-120
120-140
140-160
160-180
180-200
Total
Department of Agricultural Statistics, OUAT
No. of ear heads
(f)
6
28
35
50
30
10
12
9
180
Page-8
UG Practical Manual on Statistics
Solution:
Table-5.
Computation of mean (A.M.) by direct method, short-cut
method and step-deviation method
Mid
value
(mi)
Class
(X)
40-60
60-80
80-100
100-120
120-140
140-160
160-180
180-200
Total
50
70
90
110
130
150
170
190
fi
fi mi
A
6
28
35
50
30
10
12
9
N=180
300
1960
3150
5500
3900
1500
2040
1710
fimi =
20060
110
-
ui=
(m i  A)
h
di
fidi
-60
-40
-20
0
20
40
60
80
-
-360
-1120
-700
0
600
400
720
720
 fidi =
260
-3
-2
-1
0
1
2
3
4
-
fi ui
-18
-56
-35
0
30
20
36
36
fiui
= 13
The mean weight of ear head is given by:
n
i. Direct method: X 
f m
i
i 1
i
N

 20060
180
  111.44g
n
ii. Short-cut method : X  A 
f d
i 1
i
N
i
 110 
260
 110  1.44  111.44g
180
iii. Step-deviation method:
X A
Table-6.
h n
20
f i u i  110 
 13  110  1.44  111.44g

N i 1
180
Computation of Geometric mean (G.M.)
Class
Mid value
Frequency
(x)
mi
fi
40-60
60-80
80-100
100-120
120-140
50
70
90
110
130
6
28
35
50
30
Department of Agricultural Statistics, OUAT
Log10mi
fi  log10mi
1.69
1.84
1.95
2.04
2.11
10.14
51.52
68.25
102.00
63.30
Page-9
UG Practical Manual on Statistics
140-160
160-180
180-200
Total
150
170
190
10
12
9
180
2.17
2.23
2.27
-
21.70
26.76
20.43
364.1
n
Log G 
f.
i 1
i
n
f
i 1
Table-7.
log mi

364.1
 2.02 ; G  Ant log(2.02)  104.71g
180
i
Computation of Harmonic Mean (H.M.)
Class
(x)
40-60
60-80
80-100
100-120
120-140
140-160
160-180
180-200
Total
Mid values
mi
50
70
90
110
130
150
170
190
-
Harmonic mean (H.M.) =
Frequency
fi
6
28
35
50
30
10
12
9
N=180
f
f
 m
i
i

fi/mi
0.12
0.4
0.38
0.45
0.23
0.06
0.07
0.04
(fi/ mi)= 1.75
180
 102.85 g
1.75
i
Conclusion: From the above calculation the Arithmetic Mean (A.M.),
Geometric Mean (G.M.), and Harmanic Mean (H.M.) of weight of sorghum
ear-heads are 111.44g, 104.71g, and 102.85g respectively. And the
relation obtained is A.M. > G.M. > H.M.
Note: The relation may be A.M. ≥ G.M. ≥ H.M.
Median, Quartile, Decile and Percentiles
In a frequency distribution (arranged in increasing or decreasing
order), median is that value where half of the observation would be above
the value and half below it. Similarly Quartiles, Deciles and Percentiles are
those values of the variate which divide the total frequencies into 4 parts,
10 parts and 100 parts equally respectively.
Procedure:
Prepare a cumulative frequency table and then calculate i.N/4, i.N/10,
i.N/100 to find out the ith Quartile class, ith Decile class, ith Percentile class
respectively. In case of Quartiles, i=1,2,3; in Decile, i=1,2,……,9 and in
case of Percentile, i=1,2,…….,99.
Department of Agricultural Statistics, OUAT
Page-10
UG Practical Manual on Statistics


h N
i(
)  c.f )
x
fi
where, L0= Lower limit of the : ith Quartile class in case of ith Quartile
: i th Decile class in case of ith Decile
: ith Percentile class in case of ith Percentile
h = Width of the frequency distribution class
fi =Frequency of the i th Quartile or ith Decile or ith Percentile
class
N =Total frequency = ( fi)
c.f = Less than cumulative frequency preceding the ith Quartile
or ith Decile or ith Percentile class
x=4 or 10 or 100 for Quartiles, Deciles and Percentiles,
respectively.
Formula: C T  L o 
How to find a quartile/decile/percentile class?
In a frequency table, to find out the ith Quartile class/ith Decile
class/ith Percentile class compute the i.N/4 or i.N/10 or i.N/100
respectively. Then locate the respective class in the table whose
corresponding c.f. is more than these values. In case of Quartiles,
i=1,2,3; in Decile, i=1,2,……,9 and in case of Percentile, i=1,2,…….,99.
Problem-4. Find the Median (2nd Quartile); lower Quartile(1st Quartile),
7th Decile and 85th Percentile of the frequency distribution given below:
Marks in below
statistics 10
No. of
8
students
10-20
20-30
30-40
40-50
50-60
60-70
12
20
32
30
28
12
above
70
4
Solution:
Table-8.
(i)
Computation of Median, Quartile, Decile and Percentile
Marks in Statistics
(X)
No. of Students
(fi)
Less than Cumulative
frequency (c.f)
<10
10-20
20-30
30-40
40-50
50-60
60-70
>70
8
12
20
32
30
28
12
4
8
20
40
72
102
130
142
146=N
Median = 2nd quartile denoted by Q2 i.e. i=2
 2  N 146 
So, for i=2, i.N/4= 

  73
2 
 4
Department of Agricultural Statistics, OUAT
Page-11
UG Practical Manual on Statistics
Hence Median Class is 40-50 corresponding to c.f.=102 which is
h
>73. Median = L0 +
(N/2-c.f)
fi
10
= 40 +
(73 - 72)= 40+0.33= 40.33
30
(ii)
First Quartile = Q1 Here, i=1
146
So, for i=1, i.N/4 = (1 x N/ 4 =
)=36.5
4
Hence Q1 Class is 20-30 corresponding to c.f.=40 which is >36.5.
h
Q1 = L0 +
(N/4 - c.f.)
fi
10
= 20 +
(36.5  20)  20  8.25  28.25
20
(iii) Seventh Decile = D7
Here, i=7
7  146
So, for i=7, i.N/4= (7  N / 10) 
)  102.2
10
And 7th decile class is 50-60.
h
D7  Lo  7.N / 10  c. f .
fi
 50 
(iv)
10
(102.2  102.0)  50  0.07  50.07
28
85th Percentile = P85
Here, i=85
85  146 

So, for i=85, i.N/4=  (85  N 100) 
 =124.1
100 

And 85th Percentile class is 50-60.
10
P85  50  (124.1  102)  50  7.89  57.89
28
Mode of a frequency distribution
The Mode is the value of the variate which occurs most frequently in
the data set. In a frequency table the Modal class is the class which has
greatest frequency.
Procedure:
i. For a series or ungrouped data: The observation which have the highest
frequency i.e. the value which occurs maximum times is the mode.
ii. For grouped data:
Formula:
Mode ( M O )  Lo 
f  fp
h
(2 f  fp  f s )
Where, L0 = Lower limit of the modal class
Department of Agricultural Statistics, OUAT
Page-12
UG Practical Manual on Statistics
f = frequency of the modal class
fp = frequency preceeding the modal class
fs = frequency succeeding the modal class
h = width of the frequency distribution class
Note: The class which has highest frequency is the modal class
Problem-5. Compute the Modal value of the wages of workers in a farm
from the following frequency distribution.
Wages (Rs.)
30-35
35-40
40-45
45-50
50-55
55-60
60-65
65-70
No. of workers
12
18
22
27
17
23
29
8
Solution:
Modal class = Maximum frequency(=29) class i.e. 60-65
( f  fp)
Mode = L0 
xh
(2 f  f p  f s )
L0 = lower limit of modal class = 60
f = frequency of modal class = 29
fp = frequency of the preceeding modal class = 23
fs = frequency of the succeeding modal class = 8
h = class size = 5
Mode = 60 
(29  23)  5
6
 60 
 5  60  1.11  61.11
(2  29  23  8)
27
1.4. Measures of dispersion of a frequency distribution
Literal meaning of dispersion is scatterdness. We study dispersion
to have an idea about the homogeneity or heterogeneity of the
distribution i.e. the scatterdness of observations from a central value.
There are several measures of dispersion and each provides specific
information concerning the scatter or dispersion of values in a
distribution. Measure of mean along with dispersion gives some more
information about the data. The measures of dispersion are Range,
Quartile Deviation, Mean Deviation, Standard Deviation, Variance and
Coefficient of Variation.
Department of Agricultural Statistics, OUAT
Page-13
UG Practical Manual on Statistics
Mean deviation from a particular value ‘A’ (Mean or Median or
Mode) of a frequency distribution
Procedure:
Mean deviation is defined as the arithmetic mean of the absolute
deviations of the variate values from a particular measure of location. This
mean deviation may be about Mean, about Median or about Mode.
In a frequency distribution,
1 n
M.D.   f i x i  A
N i 1
where, x1, x2,…………., xn are values of classes or mid-values of the classes
with frequencies f1,f2,………..,fn.
N= Total frequency =
n
f
i 1
i
A= either Mean or Median or Mode
Problem-6. Compute the Mean Deviation from the Mean from the
following data.
Wages (Rs.)
60-70
50-60
40-50
30-40
20-30
Number of labourers
5
10
20
8
3
Solution:
Table-9. Computation of Mean Deviation from Mean
Wages
(Rs.)
60-70
50-60
40-50
30-40
20-30
Total
Mean= 
Mid Values
(xi)
65
55
45
35
25
 fx
f
i i
i

Number of
labourers (fi)
5
10
20
8
3
46
f i xi
325
550
900
280
75
2130
|d| =
|x-mean|
18.70
8.70
1.30
11.30
21.30
-
f |d|
93.50
87.00
26.00
90.40
63.90
360.80
2130
 46.30
46
Mean Deviation from mean 
f d
f
Department of Agricultural Statistics, OUAT

360.80
 7.843
46
Page-14
UG Practical Manual on Statistics
Standard Deviation, Variance and Coefficient of Variation (C.V.)
Procedure:
The arithmetic mean of the squares of the deviation of the variate
values from their arithmetic mean is defined as the Variance. The positive
square root of the Variance is called the Standard Deviation (S.D.).
Coefficient of Variation (C.V.) is the relative magnitude of Variation,
based on observations relative to the magnitude of their arithmetic mean.
It is defined as the ratio of standard deviation to arithmetic mean
expressed as percentage.
There are two methods for calculation of Standard deviation:
i). Direct method
ii). Short-cut method (by changing of origin and scale)
i.
Direct method:
Step 1 : Calculate mid value (xi) for group data
Step 2 : Calculate fi.xi of each class and finally  fi.xi
Step 3 : Calculate xi2 and fi.xi2 and finally  fi.xi2
Step 4 : Calculate S.D. (  ) by using the formula
f
S.D.=  =+
i
. xi
  fi . xi


N

2
N
f

2

 , Where, N   f i


  fi xi 

and Variance,  2

 N 
N


ii.
Short-cut Method or Step deviation method:
Step 1 : Calculate the mid value (xi) for group data
Step 2 : Calculate deviation value (di), where
x A
where, A=any arbitrary value or mean, c=class size
di  i
c
2
Step 3: Calculate, f i . d i  and f i . d 2 i and finally  f i d i and  f i . d i
i
.xi
2
2


Step 4: Calculate S.D. by using formula
S.D=   c
 f .d
i
2
i
N
 f d 

2
i. i
N
 fd2  fd
 i i  i i
So, Variance =   c 
 N
 N


2
2
Department of Agricultural Statistics, OUAT




2




Page-15
UG Practical Manual on Statistics
 
 S.D. 
Coefficient of Variation, C.V.= 
  100    100
X
 Mean 
Standard deviation is an absolute measure of dispersion whereas
C.V. is a relative measure of dispersion expressed in percentage for
comparing two or more data sets.
Problem-7. Compute the Standard Deviation, Variance and C.V. from the
following data.
Size of the holding
(ha)
2.5-3.5
3.5-4.5
4.5-5.5
5.5-6.5
6.5-7.5
7.5-8.5
8.5-9.5
Solution:
Table-10.
No. of
farmers
1000
2300
3600
2400
1700
3000
500
Calculation table for Standard Deviation
Size of
holding
(ha.)
2.5-3.5
3.5-4.5
4.5-5.5
5.5-6.5
6.5-7.5
7.5-8.5
8.5-9.5
Total
Mid value
(xi)
3
4
5
6
7
8
9
(fi)
fi .xi
fi .xi 2
10,00
3000
9000
2300
9200
36,800
3600 180,00 90,000
2400
14400
86400
1700
11900
83,300
3000
24000
19200
500
4500
40,500
14,500 85,000 5,38,000
di=(xi -A)
for A=6
-3
-2
-1
0
1
2
3
fi.di
fi.di2
-3000
-4600
-3600
0
1700
6000
1500
-2000
9000
9200
3600
0
1700
12000
4500
40,000
a). Direct method:
S.D=
f
i
.xi
2
N
  f i .x i

 N





2
2
538000  85000 
=

  37.103  34.362  1.65
14500  14500 
b). Step Deviation Method:
i.
S.D =  c 
= 1
f
i
.di
2
N
40,000   2000 


14,500  14500 

 f
.di 
2
i
N
2
Department of Agricultural Statistics, OUAT
Page-16
UG Practical Manual on Statistics
=
2.758  0.019 =
ii. Variance = S.D 
2.739 1.655
= 1.655
= 2.739
S.D.
iii. Coefficient of Variation, C.V. =
 100
Mean
 f i . x i = 85000  5.862
Here, Mean =
14,500
 fi
 C.V 
2
2
S.D
1.655
 100 
 100
Mean
5.862
 28.23%
Moments, skewness and kurtosis
First four moments about mean of a frequency distribution
Procedure:
Generally there are two types of moments.
1).Moments about mean (  r )
r 
 f (x  x)
f
i
r
i
i
2).Moments about origin (  'r )
r
1
 f .d

f
i
r
where, d i  x i  A and A=any arbitrary value
i
i
By step deviation method
r
h r  fidi
x A
r 
( Where, d i  i
)
 fi
h
Moments about mean are:
1  0
'
 2   2 '( 1 ) 2
3  3  3 2 1 '  2(  ' 1 )3
'
'
 
 4   4  4 '3 1  6 2 ( 1 )  3  '1
'
'
'
'
2
4
Measure of Skewness and Kurtosis for a frequency distribution
Skewness is defined as lack of symmetry from mid value. Measures of
Skewness signify the direction and extent of Skewness (skewed to left or
right). There are two methods to find out Measure of Skewness from a
given frequency table.
First method – Karl Pearson coefficient of Skewness
Step-1.
Find out Mean, Mode and S.D.
Department of Agricultural Statistics, OUAT
Page-17
UG Practical Manual on Statistics
Step-2. Calculate measure of Skewness by using the formula given by
Karl Pearson,
Sk 
Mean  Mode
S.D
Second method - For wide class of frequency distribution
Step-1.
Step-2.
Find 2nd and 3rd moments about mean
Calculate measure of Skewness,
3
3
2
2
 1   1 
 f i (x i  x) 2
 f i ( xi  x )3
Where,  2 
, 3 
 fi
 fi
If 1 =0 or  1 =0, indicates the distribution is symmetrical otherwise
skewed to left or right as per the sign of 3 -ve or +ve.
Kurtosis is a measure of the peakedness or flatness of a curve of a
distribution. Kurtosis is of three types - Platykurtic, Leptokurtic and
Mesokurtic. Kurtosis can be computed by the following steps.
Step-1.Find out 2nd and 4th moments about the mean of distribution
Step-2.Calculate Kurtosis as,

 2  42 or  2   2  3
2
 4  4th central moment about mean
Where,
 2  2nd central moment about mean
If  2 = 3 or  2 =0, indicates the distribution is normal i.e. mesokurtic
 2 >3 or  2 >0, indicates the distribution is more peaked i.e. leptokurtic
 2 <3 or  2 <0, indicates the distribution is more flattened i.e. platykurtic
Problem-8. Calculate the four moments about mean and find out the
measures of Skewness & Kurtosis from the following table.
Class
Interval
Frequency
10-20
20-30
30-40
40-50
50-60
60-70
70-80
3
7
4
14
8
6
3
Solution:
Department of Agricultural Statistics, OUAT
Page-18
UG Practical Manual on Statistics
Table-11. Calculation of moments
Class
interval
Frequency
(fi)
10-20
20-30
30-40
40-50
50-60
60-70
70-80
Total
3
7
4
14
8
6
3
45
Mid
value
(xi)
15
25
35
45
55
65
75
di=
( x i  A)
h
-3
-2
-1
0
1
2
3
2
fidi
fidi
-9
-14
-4
0
8
12
9
2
27
28
4
0
8
24
27
118
fidi
3
-81
-56
-4
0
8
48
81
-4
fidi
4
243
112
4
0
8
96
243
706
From the table,
 '1  h
 fi .d i
2
 10 
 0.44
 fi
42
'2  h 2
 fi .d i
118
 100 
 262.22
 fi
45
 '3  h 3
 fi .d i
4
 1000 
 88.88
 fi
45
'4  h 4
 fi .d i
706
 10,000 
 156888.88
 fi
45
2
3
4
  2   2  (1 ) 2  262.22  (0.44) 2  262.02
'
'
3   '3  31 .2  2( 1' )3
'
'1
 88.88  3(0.44)(262.220  2  (0.44)3
 435.01  0.170  434.83
'
'
2
'
 ' 4   ' 4  6 3 . ' 4  4 2  ' 4  3(1 ) 4
 156888.88  6  (88.55) (0.44)  4  262.22  (0.44) 2  3  (.44) 4
 156888.88  234.64  203.06  0.11
 4  157326.47
So,
Skewness =  1 = ( 1 ) 
 23
(434.83) 2
189077.12


 0.10
3
3
179888.46
 2
(262.02)
Department of Agricultural Statistics, OUAT
Page-19
UG Practical Manual on Statistics
Kurtosis =  2 
4
157326.47 157326.47


 2.29
2
( 2 )
(262.02) 2
68654.48
By moment method Skewness and Kurtosis of the given distribution are 0.10 and 2.29
respectively.
So, it is concluded that the distribution of the data is not
symmetrical i.e. skewed to the left as  1 =0.10 and the sign of 3 is –ve.
Again the distribution is also not normal i.e. less peaked(platykurtic) as
 2 is less than 3,i.e.,  2 =2.29.
Exercise: The following are the 405 soybean plant heights collected from
a particular plot.
Plant height
(cm.)
No. of
plants( f i )
812
6
1317
17
1822
25
2327
86
2832
125
3337
77
3842
55
4347
9
4852
4
5357
1
Compute:
i).A.M., G.M., H.M., Median, Mode
ii). Mean Deviation from mean, S.D., Variance, C.V.
iii). Coefficient of Skewness and Kurtosis
iv). Interpret the results of above for soyabean
1.5. Testing of Hypothesis or Test of Significance or decision rule
The estimate based on sample values do not equal to the true
value in the population due to inherent variation in the population.
The samples drawn will have different estimates compared to the
true value. It has to be verified that whether the difference between
the sample estimate and the population value is due to sampling
fluctuation or real difference.
If the difference is due to sampling
fluctuation only it can be safely said that the sample belongs to the
population under question and if the difference is real we have
every reason to believe that sample may not belong to the population
under question.
Steps involved in test of hypothesis:
1)
2)
3)
4)
The null and alternative hypothesis will be formulated
Test statistic will be constructed
Level of significance will be fixed
The table (critical) values will be found out from the tables for a
given level of significance
5) The null hypothesis will be rejected at the given level of significance
if the value of test statistic is greater than or equal to the critical
Department of Agricultural Statistics, OUAT
Page-20
UG Practical Manual on Statistics
value. Otherwise null hypothesis will be accepted.
6) In the case of rejection the variation in the estimates will be
called “ significant‟ variation.
In the case of acceptance the
variation in the estimates will be called “not- significant‟.
1.6. Standard normal deviate (SND) or Z tests or Large Sample
Tests
If the sample size n ≥ 30 then it is considered as large sample and
if the sample size n< 30 then it is considered as small sample and
accordingly there are large sample and small sample tests.
SND Test or One Sample (Z-test) for single mean
Case-I: Population standard deviation () is known
Assumptions:
1.
2.
Population is normally distributed
The sample is drawn at random
Conditions:
1.
2.
Population standard deviation  is known
Size of the sample is large (n > 30)
Procedure: Let x1,x2, ………xn be a random sample size of n from a
normal population with mean μ and variance 2. Let x be the sample
mean of sample of size ‘n’
Null Hypothesis is H0 : μ = μ0 (a specified value)
and alterative is H1: μ ≠μ0 (two-tail)
Under H0, the test statistic is
Z=
x  0
/ n
~ N(0,1)
i.e. the above statistic follows Normal Distribution with mean μ0 and
varaince ‟1‟.
If Zcal ≤ Z tab at 5% level of significance, H0 is accepted and hence we
conclude that there is no significant d i f f e r e n ce between the
population mean and the one specified in H0 as μ0.
Problem-9. A sample of 900 leaves has a mean of 3.4 cms and S.D. of
2.61 cms. Is the sample drawn from a large population of mean 3.25
cms?
Solution:
Here, Null Hypothesis is H0 : μ = μ0
and altenative is H1: μ ≠μ0 (two-tail)
Department of Agricultural Statistics, OUAT
Page-21
UG Practical Manual on Statistics
Given x =3.4, μ0=3.25, σ=2.61 and n=900
Putting the values in the formula, we get Z=1.73
The tabulated value of Z at 5% is 1.96.
So, Z calculated is less than tabulated. Hence, H0 is accepted i.e.
the sample drawn is from a large population of mean 3.25 cms.
Exercise: A herd of 1500 steer was fed a special high-protein grain for a
month. A random sample of 29 was weighed and had gained an average
of 6.7 kgs. If the standard deviation of weight gain for the entire herd is
7.1kgs., test the hypothesis that the average weight gain per steer for the
month was more than 5 kgs. (Hints: H 0: μ = 5 H 1: μ > 5, Zcal=1.289)
Case-II: If  is not known
Null hypothesis (H0) :  = 0
under H0, the test statistic
Z=
| x  0 |
s/ n
~ N(0,1)
1
Where, s= [ ( x 2 )  ( x / n) 2 )] and x’s are sample observations.
n
If Zcal ≤ Z tab at 5% level of significance, H0 is accepted and hence
we conclude that there is no significant difference between the
population mean and the one specified in H0 otherwise we do not accept
H0.
The table below gives some critical values of Z  as:
Level
of
significance
10%
5%
1%
Critical value of Z 
Two-tail
1.645
1.96
2.58
One-tail
1.28
1.645
2.33
SND test for two sample means or Z-test of significance for
difference of two means
Case-I: when σ is known
Procedure:
Let x1 be the mean of a random sample of size n1 from a population
with mean μ1and variance σ12 and let x2 be the mean of a random
sample of size n2 from another population with mean μ2 and variance
Department of Agricultural Statistics, OUAT
Page-22
UG Practical Manual on Statistics
σ22.
The hypothesis is, H0: μ1= μ2 and H1: μ1≠ μ2(two-tail)
i.e. the null hypothesis states that the population means of the two
samples are identical. Under the null hypothesis the test statistic
becomes
| x1  x 2 |
Z=
~N(0,1)
 12  22

n1 n2
i.e the above statistic follows Normal Distribution with mean “0‟ and
variance ‟1‟.
2
2
If σ =σ = σ2 (say)
i.e. both samples have the same standard
1
2
deviation(or variance), then the test statistic becomes
| x1  x 2 |
Z=
~N(0,1)
1 1


n1 n2
If Zcal ≤ Z tab at 5% level of significance, H0 is accepted otherwise
rejected.
If H0 is accepted means, there is no significant difference between two
population means of the two samples and means are identical.
Problem-10. The Average panicle length of 60 paddy plants in field
No.1 is 18.5 cm and that of 70 paddy plants in field No.2 is 2 0 . 3 cm.
with common S.D. o f 1.15 cm. Test whether there is significant
difference between two paddy fields w.r.t. mean of panicle length.
Solution:
Hypothesis, H0: There is no significant difference between the means
of two paddy fields w.r.t. panicle length, i.e. μ1=μ2
Under H0, the test statistic becomes
Z= 1  2 ~N(0,1)
where,
x1 =18.5, x2 =20.3 n1=60, n2=70, σ=1.15
Substituting the given values in the formula, we get Z=8.89
Conclusion: So, at 5% level of significance 8.89 > 1.96(table value) and
hence H0 is rejected means there is significant difference between mean
panicle lengths of the two paddy populations in regard to panicle length.
Department of Agricultural Statistics, OUAT
Page-23
UG Practical Manual on Statistics
Example: The amount of a certain trace element in blood is known to
vary with a standard deviation of 14.1 ppm (parts per million) for male
blood donors and 9.5 ppm for female donors. Random samples of 75
male and 50 female donors yield concentration means of 28 and 33 ppm,
respectively. Test whether the population means of concentrations of the
element are the same for men and women assuming unequal variance?
(Hints: H 0: μ1 = μ2 H1 : μ1 ≠ μ 2 Zcal=-2.37)
Case-II: when S.D. of both populations not known
The above methods are followed only after estimating the S.D. of the two
populations from the sample observations as:
S1= [
1
2
( x1 )  ( x1 / n1 ) 2 )]
n1
S2= [
1
2
( x2 )  ( x2 / n2 ) 2 )]
n2
Where x1 and x2 are the independent sample observations with sizes n1
and n2 from the two normal populations respectively.
The pooled variance (S2) or S.D.(S) is computed as:
S2=
Problem-11. A breeder wants to investigate whether the number of
filled grains per panicle is the same in a new variety of paddy ACM.5
and an old variety ADT.36. To verify a random sample of 50 plants of
ACM.5 and 60 plants of ADT.36 were selected from the experimental
fields. The following results were obtained:
ForACM.5
For ADT.36
Mean=139.4 Mean=112.9
S1=26.864
S2=20.1096
N1=50
N2=60
Test whether the claim of the breeder is correct.
Solution:
The hypothesis is, H0: μ1= μ2 and H1: μ1≠ μ2(two-tail)
Assuming that the two population variances are unequal put the given
values in the formula
Z=
| x1  x 2 |
 12  22

n1 n2
= 4.76
Calculated value of Z > Table value of Z at 5% ls (=1.96), H0 is
rejected. We conclude that the number of filled grains per panicle is
significantly different in the two verities ACM.5 and ADT.36.
1.7. Small Sample Tests
Department of Agricultural Statistics, OUAT
Page-24
UG Practical Manual on Statistics
It is applicable when the sample size n<30.
Test of hypothesis on equality of two variances (Snedecor’s F-test or
variance ratio test)
Let x1, x2,…,xn1 of size n1 be a sample drawn from a normal
population with variance x2 and y1, y2,….,yn2 be another sample of size
n2 drawn independently from a normal population with variance y2 for
the same variable under study. Now we are interested to know whether
two samples are drawn from two different normal populations or they
belong to same normal population w.r.t. variance or scatterdness of the
observations.
Procedure:
Step-1. The Assumptions in F-test:
i.
Parent population must be normal.
ii.
Samples are independent.
Step-2. Take the null hypothesis
H 0 :  2 x   2 y against Alternate hypothesis H 1 :  2 x   2 y
Step-3. Choose the level of significance i.e 5% or 1%.
Step-4. Choose the location of Critical region i.e one tailed or two tailed
test.
Step-5. Compute the observed value of F as:
2
S
F  x2 with (n1  1) and (n2  1)d . f .if S 2 x  S 2 y (Greater value is taken in the numerator )
S y
Where, S x 
2
( xi  x ) 2 2
( yi  y ) 2
S y
n1  1
n2  1
Step-6. Compare the observed value with tabular value.
Step-7. If Fcal > Ftab then null hypothesis rejected and significant.
Fcal≤ Ftab then null hypothesis accepted and it is not
significant.
Problem-12. Two independent samples on dry weight(g) of plants were
observed from two populations as:
Sample–1 (x): 39, 41, 43, 41, 45, 39, 42, 44
Sample–2 (y): 40, 42, 40, 44, 39, 38, 40
Does the estimate of the population variances differ significantly?
Solution:
Department of Agricultural Statistics, OUAT
Page-25
UG Practical Manual on Statistics
The Hypothesis is:
H 0 :  2 x   2 y (take the hypothesis that the population have same var iances)
H1 :  2 x   2 y
Level of significance, = 0.05
Test Statistics, F 
Sx
( xi  x ) 2
( yi  y ) 2
2
2
where
,
S

and
S

x
y
2
n1  1
n2  1
Sy
2
Table-12. Calculation of variances
Obs. No.
x
y
(x  x)
( y  y)
(x  x) 2
( y  y) 2
1
39
40
-2.75
-0.42
7.5625
0.1764
2
41
42
-0.75
1.58
0.5625
2.4964
3
43
40
1.25
-0.42
1.5625
0.1764
4
41
44
-0.75
3.58
0.5625
12.8164
5
45
39
3.25
-1.42
10.5625
2.0164
6
39
38
-2.75
-2.42
7.5625
5.8564
7
42
40
0.25
0.42
0.0625
0.1764
8
44
-
2.25
-
5.0625
-
-
-
33.5
23.7148
Total
x   xi
Sx
2
n1
334 283
 334  41.75, y   y i
 283  40.42
8
n2
7
( x i  x ) 2 33.5
( y i  y) 2 23.7148
2


 4.782969 , S y 

 3.952144
n1  1
7
n2 1
6
F  Sx
2
S
2

y
4.782969
 1.210
3.952144
As n1=8 and n2=7, so for 7 and 6 degree of freedom at  = 0.05
the critical value of ‘F’ is 3.97. Since, the calculated value of F=1.21 is
less than the critical value(=3.97) the H0 is accepted i.e. the estimate of
the population variances does not differ significantly. It is concluded that
the two samples have been drawn from the same population or the
variances of the two populations are same.
Test for single mean (Student’s t-test)
Department of Agricultural Statistics, OUAT
Page-26
UG Practical Manual on Statistics
This test is used to test if the sample mean ( x ) differ significantly
from the hypothetical value of the population mean 0.
Procedure:
Step-1.
Let x1, x2, …xn be a random sample of size n drawn from a population
with following assumptions :
i.
Parent population must be normal.
ii.
The sample is random.
iii.
The population Standard deviation is normal.
iv.
The sample size must be <30.
Step-2.
Take Null hypothesis H O :    o
Alternate hypothesis H 1 :    o
Step-3. Level of Significance as 5% or 1%
Step-4. Choose the location of ritical Region i.e one tailed or two tailed.
Step-5. Compute the sample statistic (observed) of student t-test.
x  0
with (n-1) degrees of freedom
t
s
n
Where,
x  Sample mean
  Specified Population mean
0
s  Sample S tan dard deviation
n
i .e .s 

(x
i
 x)2
i
n1
Step-6. Compare the sample statistic with tabulated value.
Step-7. Decision Rule
i.
ii.
If t(cal) > t(tab) then Significant and Null hypotheses rejected.
If t(cal) ≤ t(tab) then Not significant and Null hypothesis accepted.
Problem-13. Ten animals are fed with an animal feed. The gain in
wt.(kg) of animals are given below. Negative value indicates loss in
weight. Test whether there is significant gain in weight as a result of
consumption of that particular animal feed.
Animal No.
1
2
3
4
5
6
7
8
9
10
Gain in Wt.(x)
25
10
11
13
12
8
5
13
7
-4
Solution:
Department of Agricultural Statistics, OUAT
Page-27
UG Practical Manual on Statistics
Null hypothesis Ho :   0 (i.e. there is no gain in weight)
H 1 : . 0 i.e. there is gain in weight
This is a case of one tailed test.
Table-13. Calculation for t-Statistic
Animal No.
Gain in wt.(x)
(x  x)
1
2
3
4
5
6
7
8
9
10
Total
25
10
11
13
12
8
5
13
7
-4
 x  100
15
0
1
3
2
-2
-5
3
-3
-14
 Mean  x 
and t 
(x  x) 2
225
0
1
9
4
4
25
9
9
196
( x  x ) 2  482
 x 100

 10
x
10
x  0
s
n
Where x  10, 0  0, n  10
 x  x 
2
s

n 1
10  0
t
 4.31
7.3
10
482
 7.3
9
Since the calculated t-value of 4.31 is more than the table value of
t=1.833 at 5% level significance for 9 d.f. for one tail test, the null
hypotheses is rejected and alternate hypothesis is accepted. So, we can
conclude that there is +ve gain in wt. due to consumption of the
particular feed.
Exercise: A random sample of height (ft.) of 10 trees from a forest was
observed. Test whether the mean height of trees of that forest is 100ft. or
not at 5% level. (Hints: Calculated t=-0.62)
Test for difference of two means for Independent samples (Fisher’s
t-test)
Department of Agricultural Statistics, OUAT
Page-28
UG Practical Manual on Statistics
This test is used to test the difference between two population
means on the basis of two independent sample means or to test whether
two samples have been drawn from the same population having same
mean.
Procedure:
Let x1, x2, …xn1 be a random sample of size n1 drawn from a population
with mean x and y1, y2, … , yn2 be another independent random sample
with mean y having the following assumptions.
i.
ii.
Parent population must be normal.
Samples are random and independent of each other.
Case-I: Population variance for both the samples same and unknown.
Step-1. Take Null hypothesis Ho :  x   y
Alternative hypothesis H 1 :  x .   y
Step-3. Choose the level of significance either 5% or 1%.
Step-4. Choose the location of Critical region i.e. one tailed test or two
tailed test.
Step-5. Compute the sample t value (calculated) on the following formula
of Fisher’s- t test.
t
xy
1
1
s

n1 n 2
with (n1+n2–2) d.f.
 ( x i  x ) 2   ( y i  y) 2
Here, s 
is the estimated standard deviation
n1  n 2  2
of the population
Where,
x  Sample mean of 1st sample, n1  no of observation of 1st sample
y  Sample mean of 2nd Sample, n2  no.of observation of 2nd Sample
Step-6. Compare the calculated value with table value.
Step-7. If. t(cal) > t(tab) then Null hypothesis rejected and it is significant.
if, t(cal) ≤ t(tab) then Null hypothesis accepted and it is not
significant.
Problem-14. The interest is to study the effect of two treatments A & B
on the yield of a crop each of the treatments being repeated in 5 plots
and the yield/plot noted below.
Yield (in kg/plot)
Department of Agricultural Statistics, OUAT
Page-29
UG Practical Manual on Statistics
Treatment-A (x)
9
10 13 11 7
Treatment-B (y)
15 10 14 15 11
x  10
y  13
Test whether the mean yield obtained as a result of these two treatments
differ significantly.
Solution:
Step-1. Null hypothesis,
Ho :  A   B (i.e no significant difference between two means)
Alternate Hypothesis,
H 1 :  A   B (i.e two means differ significantly )
Step-2. This is a case of two-tailed test.
Step-3. The level of significance chosen is 5%.
Step. 4
Table-14. Calculation for Fisher’s–t Statistic
Sl. No.
x
y
(x  x)
( y  y)
(x  x) 2
( y  y) 2
1
2
3
4
5
Total
9
10
13
11
7
50
15
10
14
15
11
65
-1
0
3
1
-3
-
2
-3
1
2
-2
-
1
0
9
1
9
20
4
9
1
4
4
22
So,
x
50
65
 10 , y 
 13 , n1  n2  5 and
5
5
 x  x     y
 y
xy
10  13
2
s
i
i
2
n1  n2  2
Test Statistic, t 
s
1 1

n1 n2

2.29

1 1

5 5
20  22

8

42
 5.25  2.29
8
3
3

 2.08
2.29  0.63 1.44
Step-5. The two tailed table value for “t” at 5% significance level with 8
d.f. is 2.306. So, calculated t is less than table value and hence the null
hypothesis is accepted. It is concluded that the two treatments do not
produce any significant difference in the mean yield.
Exercise: To assess the effect of inoculation with mycorrhiza on the height
growth of seedlings of a crop, 10 seedlings inoculated with
mycorrhiza(Group-1) and another 10 seedlings without inoculation(Group2) were collected from an experiment. The height of seedlings obtained
under the two groups of seedlings was:
Department of Agricultural Statistics, OUAT
Page-30
UG Practical Manual on Statistics
Plot
1
2
3
4
5
6
7
8
9
10
Group I 23 17.4 17 20.5 22.7 24 22.5 22.7 19.4 18.8
Group II 8.5 9.6 7.7 10.1 9.7 13.2 10.3 9.1 10.5 7.4
Under the assumption of equality of variance of seedling height in the two
groups, test the equality of means. (tcal=11.75)
Exercise:Using the data of example of F-test, test equality of 2 means.
Test for difference of two dependent sample means(paired t-test)
Procedure:
Let (x1, y1), (x2, y2),…,(xn, yn) be n paired observations of a sample from
a population with basic assumption as follows:
i.
Parent population must be normal.
ii.
Samples are dependent and occur pair-wise.
Step-1. Take Null hypothesis: H 0 :  x   y or H o : d  0 i.e. no difference
Alternate hypotheses:
H 1 :  x   y or H 1 : d  0 (or d  0 or d  0)
Step-3. Choose the level of significance either 5% or 1%.
Step-4. Choose the location of Critical region i.e. one tailed test ‘or’ two
tailed test.
Step-5. Compute the observed t statistic on the following formula of pair-t
test:
d
t
with (n  1) d . f .
s
n
Where, di  xi  yi
s
 d
1
d
2
n  1
i.e. d  mean of ' d ' var iable)
Step-6. Compare the observed value with tabular value.
Step-7. If t-calculated > t-tabulated then null hypothesis rejected and it is
significant otherwise null hypothesis is accepted.
Problem-15. Memory capacity of 9 students was tested before and after
training. Test at 5 per cent level of significance whether the training was
effective from the following scores.
Student
Before (x)
1
2
3
4
5
6
10
15
9
3
7
12
Department of Agricultural Statistics, OUAT
7
8
16 17
9
4
Page-31
UG Practical Manual on Statistics
After (y)
12
17
8
5
6
11
18 20
3
Solution:
Here, marks obtained by the same batch of students in the tests are
available. Hence, the marks are expected to be correlated. So, paired ttest will be appropriate. Then taking the null hypothesis that the mean of
difference is zero, we can write,
H 0 :  x   y , which is equivalent to test H 0 : d  0
H1 :  x   y
As we are having matched pairs; we use paired ‘t’-test , which is given by
t
d
with (n  1) d . f .
S
n
Table-15. Calculation for paired-t
Score (x)
xi
10
15
9
3
7
12
16
17
4
-
Student
1
2
3
4
5
6
7
8
9
Total
Difference
di=(xi-yi)
-2
-2
1
-2
1
1
-2
-3
1
-7
di2
4
4
1
4
1
1
4
9
1
29
 di  7

 0.778
9
9
Here d 
s
Score (y)
yi
12
17
8
5
6
11
18
20
3
-
 (d
 d )2

n 1
i
d
 n.d 2
n 1
2
i
29  9   0.778
 2.944 1.715
9 1
2

t
d

S
n
 0.778 0.778

 1.361
1.715
0.572
9
Department of Agricultural Statistics, OUAT
Page-32
UG Practical Manual on Statistics
Table value of ‘t’ at 5% level for 8 df is 2.306. The calculated value
is less than table value. Hence, it is not significant and the null hypothesis
is accepted. Hence we can conclude that the training was not effective.
Exercise: Data pertaining to organic carbon(OC) content measured at two
different layers of 10 number of soil pits in a natural forest were collected
to study whether the OC content is same or different as:
Organic carbon (%)
Soil pit
1
2
3
4
5
6
7
8
9
10
Layer (x) 1.59 1.39 1.64 1.17 1.27 1.58 1.64 1.53 1.21 1.48
1
Layer (y) 1.21 0.92 1.31 1.52 1.62 0.91 1.23 1.21 1.58 1.18
2
Analyse the data and draw your conclusion.
(Hints: sd2=0.1486 tcal =1.485)
1.8. Chi-square test (χ2)
Chi-square test of significance is for testing the agreements
between observation and hypothesis (or expected) where the data are
purely qualitative or enumerative in character. Such enumerative data are
characterized by the frequency of occurrence or non-occurrence of events
or attributes or categories expressed as counts or proportions or
percentages. But, the expected frequency in each category should
preferably be more than 5 and the total number of observations should be
large, say, more than 50.
χ2-test for Goodness-of-fit
This involves testing of significance of difference between observed
frequencies and the frequencies expected on some prior hypothesis or
rule. If Oi is a set of observed frequencies and Ei is corresponding set of
expected frequencies (i=1,2,…,n), the Karl Pearson’s Chi-square (χ2) is
given by :
χ2 =
Procedure:
Step-1. Follow the following assumption
i.
ii.
iii.
iv.
Sample observation should be independent.
Constraint on cell frequency should be linear i.e  Oi   Ei
Total number of frequency should be reasonably large.
No theoretical(expected) cell frequency be less than 5.
Department of Agricultural Statistics, OUAT
Page-33
UG Practical Manual on Statistics
Step-2. Take the null hypothesis , H 0 : O i  E i
Alternative hypothesis
H1 : O i  E i
Step-3. Choose the level of significance either  =5% or 1%.
Step-4. Choose the location of critical region i.e. one tailed or two tailed
Step-5. Compute the Chi-square value as per formula.
Step-6. Compare the observed value with tabular value and take decision
as:
If χ2cal > χ2tab then null hypotheses rejected and significant at  .
If χ2cal ≤ χ2tab then null hypothesis accepted and non significant at  .
Problem-16. In a cross between parents of the genetic constitution AAbb
and aaBB the phenotypes in F2 sample is classified as follows.
AB
Ab
aB
Ab
Total
87
29
32
12
160
They are expected to occur in a 9:3:3:1 ratio.
Does the segregation agree with the theoretical ratio?
Solution:
Ho: The Segregation agree with the theoretical ratio
H1: The Segregation does not agree with the theoretical ratio.
Level of Significance  = 0.05
4
(O i  E i ) 2
2
Tests Statistic is χ = 
with 3 df .
Ei
i 1
The expected frequencies are computed on the basis of the
theoretical segregation ratio 9:3:3:1. The total is 9+3+3+1=16. We
expect ‘9’ out of ‘16’ to belong to AB group, that is, the probability of AB
9
is
16
9
The expected frequency of AB is therefore,
 160  90
16
3
The expected frequency of, Ab is 16  160  30
3
The expected frequency of, aB is 16  160  30
1
And the expected frequency of ab is
160  10
16
Table-16. Calculation for Chi-square value
Department of Agricultural Statistics, OUAT
Page-34
UG Practical Manual on Statistics
Observed
frequency
(Oi)
87
29
32
12
Pheno
type
AB
Ab
aB
ab
Expected
frequency
(Ei)
90
30
30
10
(Oi-Ei)
(Oi-Ei)2
(O i  E i ) 2
Ei
-3
-1
2
2
9
1
4
4
0.100
0.033
0.133
0.400
χ2
value
0.666
The calculated χ2 value is 0.666 which is less than the critical value
of χ (with 3 d.f. at  =0.05 is 7.815). Therefore, the calculated χ2 value is
not significant. Hence we accept the null hypothesis and conclude that the
observed phenotypic ratio confirms to the theoretical segregation ratio of
9:3:3:1.
2
Exercise: Data were collected on the number of insect species from an
undisturbed area of a Wildlife Sanctuary in different months to test
whether there are any significant differences between the numbers of
insect species found in different months. (Hints: we may state the null
hypothesis as the diversity in terms of number of insect species is the
same in all months and derive the expected frequencies in different
months accordingly). Test the data. (Ans. χ2=134.84)
Month
Jan.
Feb.
Mar.
Apr.
May
Jun.
Jul.
Aug.
Sep.
Oct.
Nov.
Dec.
Total
Oi
67
115
118
72
67
77
75
63
42
24
32
52
804
χ2-test of independence or association of attributes
When individuals are classified simultaneously on the basis of
variables or attributes or categories the resulting table of frequencies is
called a (r x c) contingency table i.e. r-rows and c-columns. The χ2 test
may be applied to contingency table to find out if the variables are
independent or associated.
Procedure:
The χ2 value for this test may be obtained by two ways :
i. By estimating the value of Ei (Expected frequency) from the values of Oi
(Observed frequency) and applying 2 as goodness-of-fit.
ii. For 2x2 contingency table
2 x 2 Contingency table
Category
Group
I
II
Total
1
a
b
a+b
2
c
d
c+d
Total
a+c
b+d
N=a+b+c+d
Department of Agricultural Statistics, OUAT
Page-35
UG Practical Manual on Statistics
The simple formula to calculate 2=
(ad  bc) 2 N
with 1 d.f .
(a  b)(c  d )(a  c)(b  d )
Where a,b,c,d are observed cell frequencies. If any of the expected cell
frequencies is less than 5, then a slightly modified formula is necessary.
The corrected formula for 2x2 contingency table called Yates’ Correction
for continuity is:
2
N

 ad  bc   .N
2

2
 
(a  b)(c  d )(a  c)(b  d )
Problem-17. In a survey of fertilizer practices in India each of 323 cotton
growing fields selected for survey was classified in the twin criteria of
irrigation practice (irrigated or non-irrigated) and the practice of manuring
(manured or un-manured) resulting in the following contingency table.
Manured
Un-manured
Total
Irrigated
Non- Irrigated
Total
75(a)
35(b)
110
115 (c)
98(d)
213
190
133
323
It is required to test whether the practice of irrigation and the
practice of manuring are independent or related (associated).
Solution:
Ho: these two-factors irrigation and manuring are independent.
H1: these two-factors irrigation and manuring are dependent or
associated.
First Method: Goodness-of-fit
The expected frequencies of each cell are calculated as:
The expected frequency of the cell (a) is
(a  b)  (a  c) 110  190

 64.7,
323
N
(a  b)  (b  d ) 110  133
Cell (b) is

 45.29
N
323
(c  d )  (a  c) 213  190
Cell (c) is

 125.29
N
323
Cell (d) is
( c  d )( b  d ) 213  133

 87 .7
N
323
The 2 is calculated using the formula
Department of Agricultural Statistics, OUAT
Page-36
UG Practical Manual on Statistics
x 
2
O i  E i  2

follows x 2 distributi on with ( 2  1)  ( 2  1)  1 d . f .
Ei
Table-17. Calculation of chi-square value
i
Irrigated
75(O1)
64.7(E1)
115(O3)
125.3 (E3)
190
Manured
Un Manured
Total
Non irrigated
35 (O2)
45.3 (E2)
98 (04)
87.7(E4)
133
Total
110
213
323
2
The  value computed for the above table is
4
O i  E i 2
i 1
Ei


(75  64.7) 2 (35  45.3) 2 (115  125.3) 2 (98  87.7) 2



 6.03  6.00
64.7
45.3
125.3
87.7
Second Method: Independence of attributes
x2 

(ad  bc) 2 .N
(a  b)(c  d )(a  c)(b  d )
(75  98  35  115) 2  323
(95  35) (115  98) (75  115) (35  98)
(3.325) 2  323
592076100
 6.03  6.0

2
The  value computed for the above two methods is 6.00. Since
there are only two categories, irrigation and manuring, the df for the
2
above contingency table is one. The table value of  with 1df at 5% level
2
of significance is 3.84. Here the  calculated values is higher than the
table value and so the null hypothesis of independence of two factors
irrigation and manuring is rejected and concluded that they are mutually
related or associated.
Exercise: The following table shows the result of inoculation against
cholera in a group of people. Examine the effect of inoculation in
controlling susceptibility to cholera. (Hints: apply Yates’ correction)
Not attacked Attacked
Inoculated
43
5
Not-inoculated 7
28
Department of Agricultural Statistics, OUAT
Page-37
UG Practical Manual on Statistics
1.9. Correlation and regression
In many natural systems, changes in one attribute are accompanied
by changes in another attribute and that a definite relation exists between
the two. In other words, there is a correlation between the two variables.
For instance, several soil properties like nitrogen content, organic carbon
content or pH are correlated and exhibit simultaneous variation. Strong
correlation is found to occur between several morphometric features of a
tree. In such instances, an investigator may be interested in measuring
the strength of the relationship. Having made a set of paired observations
(xi,yi); i = 1, ..., n, from n independent sampling units, a measure of the
linear relationship between two variables can be obtained by a quantity
called Pearson’s product moment correlation coefficient or simply
correlation coefficient.
Correlation is the study of co-variation between two variables to
understand how the variables are closely related. In correlation analysis,
both the variables are normally distributed and must be continuous. For
discovering and measuring the magnitude and direction of relationship
between two variables we use the statistical tool known as correlation
coefficient and its range is -1 to +1. The + and – sign indicates the
direction of relationship and the value gives the magnitude or strength
between the two variables.
Regression is the functional relationship between two or more
variables and thereby provides a mechanism for prediction or forecasting.
When the relationship between two variables is a straight line it is called
simple linear regression.
Karl Pearson’s correlation coefficient and its test of significance
Procedure: Let (Xi,Yi); i = 1,2,3, ...n, be from n independent sampling
units of 2 quantitative variables.
a). Direct Method:
Step-1. Construct a table for finding X2, Y2 and XY values
Step-2. Calculate  X ,  XY ,  X 2 ,  Y 2
Step-3. Calculate Karl Pearson’s correlation coefficient by
n  XY   X .  Y
rxy =
n  X 2  ( X )2 . n  Y 2  ( Y )2 
b). Step deviation method (change of origin and scale):
Step-1. Calculate U & V
X A
 Y  B
Where, U  
V
. ;
.
 h 
 k 
Department of Agricultural Statistics, OUAT
Page-38
UG Practical Manual on Statistics
A, B are arbitrary values from X & Y and h, k are suitable chosen scales.
Step-2. Construct frequency distribution table for finding U,V, UV, U2,V2
Step-3. Calculate  U,  V,  UV,  U 2 &  V 2
Step-4. Calculate correlation coefficient by
ruv 

 n U
n  UV   U  V
2
 (U ) 2
 n V
2
 (V ) 2
U .V  nV .V 
U  nU  V  nV 
2
2
2

OR
2
Where, U   U / n , V   V / n
Both methods results the same value, i.e. rxy = ruv
Test of correlation coefficient:
Null hypothesis, H0:  =0 and Alternative, H1:  ≠0
Here  is the correlation in the population and r is the estimate of  from
sample observation.
Level of Significance,  =0.05
And Test statistic, t=
r n2
1 r2
~ Student’s-t distribution with (n-2) d.f.
The tcal is compared with ttab. If tcal ≤ ttab, then H0 is accepted means
not significant i.e. the two variables have no linear relationship (may be
some other like nonlinear) and if tcal > ttab, then H1 is accepted means
significant or we say the two variables are linearly related with the
magnitude and direction of r.
Problem-18. The following data gives the height of father and their sons
in 10 families. Compute the correlation coefficient of heights and test its
significance and give your conclusion.
Height of father (cm) 63 69 65 67 68 69 69 70 71 71
Height of son (cm)
65 63 63 65 67 67 68 71 61 69
Solution:
Department of Agricultural Statistics, OUAT
Page-39
UG Practical Manual on Statistics
Table-17. Calculation of correlation coefficient
Ht. Of
father
(X)
63
69
65
67
68 (A)
69
69
70
71
71
Total=682
Ht. of Son
(Y)
65
63
63
65 (B)
67
67
68
71
61
69
659
X2
Y2
U=X- V=Y- U 
U2
A
B
V
XY
3969
4225
4095
4761
3969
4347
4225
3969
4095
4489
4225
4355
4624
4489
4556
4761
4489
4623
4761
4624
4692
4900
5041
4970
5041
3721
4331
5041
4761
4899
46572 43513 44963
-5
1
-3
-1
0
1
1
2
3
3
2
0
-2
-2
0
2
2
3
6
-4
4
9
V2
0 25 0
-2
1
4
6
9
4
0
1
0
0
0
4
2
1
4
3
1
9
12 4 36
-12 9 16
12 9 16
21 60 93
a). Direct Method:
rxy 

n  XY   X .  Y
n  X  ( X ) ) . n  Y  ( Y )
2
2
2
2
and putting values
10  44963   682  659 
10  46572  465124 . 43513 .10  434281
192
192


 0.27
711
.
33
596 . 849
b). Step Deviation method:
U
V
U 
 0 .2 ; V 
 0 .9
n
n
U 
2
ruv 


 0 . 04 (V ) 2  0 . 81
 UV  n U V
U
 nU 2  V 2  nV 2
21  10 ( 0 . 18 )
60  10 ( 0 . 04 ). 93  10 ( 0 . 81 )
2
19 . 2
19 . 2

 0 . 27
( 7 . 72 ).( 9 . 21 ) 71 . 102
 The correlation coefficient between father and son in both methods is
0.27.
Test of significance of r:
Department of Agricultural Statistics, OUAT
Page-40
UG Practical Manual on Statistics
Putting the value of r in the formula, t=
the t statistic, t=
0.27 10  2
1  (0.27) 2
r n2
1 r2
=0.79
The ttab=2.31 with 8 d.f. at 5% ls.
So, tcal < ttab and H0 is accepted i.e. not significant. It is concluded
that the height of father and their son is not linearly related or we will say
that the height of father increase or decrease does not indicate the
increase or decrease in height of son.
Exercise: The data on pH and organic carbon content were measured
from soil samples collected from 15 pits taken in natural forests as given:
Soil Pit
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
pH(x)
5.7
6.1
5.2
5.7
5.6
5.1
5.8
5.5
5.4
5.9
5.3
5.4
5.1
5.1
5.2
Organic
carbon(y)
(%)
2.1
2.17
1.97
1.39
2.26
1.29
1.17
1.14
2.09
1.01
0.89
1.6
0.9
1.01
1.21
Compute a suitable statistic and test to study whether increase in
ph of soil affects the organic carbon in that forest.(Hints:r=0.3541 and
tcal=1.3652)
Exercise: The following data contain 15 paired values of photosynthetic
rate(Y) and light interception(X) observed on leaves of a particular tree
species. The photosynthetic rate is dependent variable and the quantity of
light is independent variable. Study the linear relationship between the
two variables with test.
Tree
1
2
3
4
5
6
7
8
X
0.7619 0.7684 0.7961 0.838 0.8381 0.8435 0.8599 0.9209
Y
7.58
9.46
10.76
11.51
11.68
12.68
12.76
13.73
Tree
9
10
11
12
13
14
15
X
0.9993 1.0041 1.0089 1.0137 1.0184 1.0232 1.028
Y
13.89
13.97
14.05
14.13
14.2
14.28
14.36
Spearman's Rank correlation coefficient
A rank correlation is any of several statistics that measure the
relationship between rankings of different ordinal variables or different
rankings of the same variable, where a "ranking" is the assignment of the
labels "first", "second", "third", etc. to different observations of a
particular variable. Like any correlation calculation, it is appropriate for
both continuous and discrete variables, including ordinal variables. A rank
correlation coefficient measures the degree of similarity between two
Department of Agricultural Statistics, OUAT
Page-41
UG Practical Manual on Statistics
rankings, and can be used to assess the significance of the relation
between them. A rank correlation coefficient can measure that
relationship, and the measure of significance of the rank correlation
coefficient can show whether the measured relationship is small enough
to likely be a coincidence. It is measured by Spearman's rank correlation
coefficient or Spearman's rho denoted by the Greek letter (rho)
of statistical dependence between two variables. It assesses how well the
relationship between two variables can be described and lie in the interval
[-1 to +1]. An increasing rank correlation coefficient implies increasing
agreement between rankings. The coefficient value can be interpreted as:
1 if the agreement between the two rankings is perfect; the two
rankings are the same.
ii. 0 if the rankings are completely independent.
iii. −1 if the disagreement between the two rankings is perfect; one
ranking is the reverse of the other.
i.
For a sample of size n, the n raw scores or values Xi,Yi are converted
to ranks xi,yi and ρ is computed. Identical values (rank ties or value
duplicates) are assigned a rank equal to the average of their positions in
the ascending order of the values.
The Spearman’s correlation coefficient is:
Where, di = xi – yi (i=1,2,3 ….n)
Procedure:
For a sample observation the Spearman rank correlation coefficient is:
2
6  di 2  m(m 2  1)
6  di
and when ties occur, rs  1 
rs  1 
n (n 2  1)
n (n 2  1)


Here, di= xi-yi , xi=Rank of 1st variable, yi= Rank of 2nd variable
m= No. of ties in any group.
Following steps are applicable for finding rank correlation
Step-1. Rank all observations
I.
II.
Ranking should be made from highest to lowest of the observations.
If any two or more of the observations are same in magnitude then
all of them must carry the same rank (average of ranks).
Department of Agricultural Statistics, OUAT
Page-42
UG Practical Manual on Statistics
Step-2. When a common rank is assumed for different observations of a
m(m 2  1)
factor then
is added to the numerator of the 2nd term of the
12
formula for the correlation coefficient.
Step-3. The sum of differences of the rank should be equal to zero, which
is a check for the correction of the calculation.
Problem-19. Find the Rank correlation between the following data
Preference Price (x)
73.2 85.8 78.9 75.8 77.2 81.2 83.8
Debenture Price (y)
92.8 99.2 98.8 98.3 98.3 96.7 97.1
Determine the relationship between preference share price & debenture
price?
Solution:
Table-18. Calculation of rank correlation coefficient
Preference
share price (x)
73.2
85.8
78.9
75.8
77.2
81.2
83.8
Rank x
(xi)
7
1
4
6
5
3
2
Debenture
Price (y)
97.8
99.2
98.8
98.3
98.3
96.7
97.1
Rank y
(yi)
5
1
2
3.5
3.5
7
6
di=xi-yi
di2
2
0
2
2.5
1.5
-4
-4
4
0
4
6.25
2.25
16
16
2
 d i  48.50
Here, y has 2 identical values (m=2) and n=7.
Therefore, rank correlation (rs)


m(m 2  1) 
2(2 2  1) 
2
6  d i 
6
48
.
5




12
12 



 1
 1
n (n 2  1)
7(7 2  1)
6(48.5  0.5)
 1
 0.125
336
It is concluded that the two prices are poorly related i.e. if one price
is increasing the other is not in the same way increasing.
Exercise: In a survey observations on 10 persons were taken on IQ and
No. of Hours Spent in TV per week(Y) as below. Compute the rank
correlation and study whether increase in IQ of persons invite the hours
spent in TV per week.
Department of Agricultural Statistics, OUAT
Page-43
UG Practical Manual on Statistics
Person
1
2
3
4
5
6
7
8
9
10
No. of Hours Spent
in TV per week(Y)
106
7
86
0
100
27
101
50
99
28
103
29
97
20
113
12
112
6
110
17
(Hints: Ans. rs = −0.1757)
IQ(X)
Fitting of regression equations of two variables Y and X
In regression analysis, both variables are normally distributed and
one of the variables represents cause (independent or explanatory
variable) and other is effect (dependent or response variable). The
relationship between two variables can be expressed as a function known
as Regression. When only two variables are involved in regression, the
functional relationship is known as simple regression. If the relationship
between the two variables is linear, it is known as simple linear
regression.
For simple linear regression, two regression equations are given by:
Y on X : Y  Y  b yx (X  X )
X on Y : X  Y  b xy (Y  Y )
Where, byx  regression coefficient of Y on X
b xy  regression coefficient of X on Y
Y 
Y
n
X 
X
n
, n  No. of observations
Procedure:
Fitting of regression equations are carried out in two phases.
a). Calculation of regression coefficients (bYX and bXY)
i). Direct method:
Step-1. Construct a table to find out X2, Y2, XY
Step-2. Compute X, Y, X2, Y2, XY, Y and X from the table.
Step-3. Calculate the regression coefficients by the formula:
Department of Agricultural Statistics, OUAT
Page-44
UG Practical Manual on Statistics
n  XY   X  Y
n  X 2  ( X ) 2
n  XY   X  Y
bxy 
n  Y 2  ( Y ) 2
byx 
ii). Step deviation method:
Step-1. Reduce the value of X & Y to U & V
Where A & B are arbitrary values and h & k are suitable scales and
YB
XA
U
V
h
k
Step-2. Construct the table to compute U2, V2, UV
Step-3. Compute  U ,  V ,  UV ,  U 2 ,  V 2 from the table
Step-4. Compute regression coefficients by the formula:
n  UV   U  V
n V 2  (V ) 2
n  UV   U  V
Re gression coefficient of V on U , bVU 
n U 2  (U ) 2
Where n  no. of pairs of observations
k
h
Step-5. Compute bXY = b UV & bYX = bVU
k
h
b). Finding the regression equations
Re gression coefficient of U on V , bUV 
After estimating the values of X, Y , bYX and bXY and putting these
values in the following equations the regression equations can be
obtained.
Y  Y  b yx (X  X ) and X  X  b xy (Y  Y )
Problem-20. The Following data is given monthly Income & Expenditure
on food of 10 families.
Income (x)
Expenditure (y)
120
90
80
150
130
140
110
95
70
105
40
36
40
45
40
44
45
38
50
35
Find the two linear regression equations and correlation coefficient.
Solution:
XA
Let U 
h
, V
YB
k
Here, A = 110, h= 5 ; B = 40, k =1
Department of Agricultural Statistics, OUAT
Page-45
UG Practical Manual on Statistics
Table-19. Calculation of sums & sum of squares
XA
Expenditure
U
(Y)
h
120
40
2
90
36
-4
80
40
-6
150
45
8
130
40
4
140
44
6
110
45
0
95
38
-3
70
50
-8
105
35
-1
Total=1090 413
-2
Here n = 10
Income (X)
YB
k
0
-4
0
5
0
4
5
-2
10
-5
V
13
UV
U2
0
16
0
40
0
24
0
6
-80
5
11
4
16
36
64
16
36
0
9
64
1
246
V2
0
16
0
25
0
16
25
4
100
25
211
Regression coefficient U on V = bUV ,
n  UV   U  V
n  V 2  ( V )2
10  11  (2  13)

10  211  (13) 2
110  26
136


 0.07
2110  169 1941

Regression coefficient of V on U = bvu
n  UV   U  V
n  U 2  ( U) 2
136

 0.055
10  246  (2) 2

h
5
(buv ) )  (0.07)  0.35
k
1
k
1
byx  (bvu )  (0.055)  0.011
h
5
 Re gression Coefficient of y on x and x on y are 0.011 and 0.35 respectively.
So, bxy 
Therefore, the two regression equations and correlation coefficient are:
i. Y on X : Y- 41.3 = 0.011(X-109)
ii. X on Y : X – 109 = 0.35(Y – 41.3)
iii. Correlation of X & Y = √(0.011x0.35) = 0.062
Exercise: From the Exercise in correlation data on photosynthetic rate(Y)
and light interception(X), find the regression equation of Y on X and
estimate Y when X= 0.95.
Department of Agricultural Statistics, OUAT
Page-46
UG Practical Manual on Statistics
II. DESIGN AND ANALYSIS OF EXPERIMENTS
2.1. Basic concepts on design of experiments
Planning an experiment to obtain appropriate data with respect to
any problem under investigation is known as ‘design of experiment’. It is
a complete sequence of steps taken well in time to ensure that
appropriate data will be obtained in a way which permits an objective
analysis of the data leading to valid inferences with respect to the stated
problems. “Design of experiment” comprises the process of planning of
experiments, analysing the data /observations and interpretation of the
results. The techniques for making inferences is known as the “analysis of
variance”. There are three basic principles of the design of experiments:
(i) Replication, (ii) Randomization and (iii) Local control.
(i).Replication: The replication of treatments by applying them to more
than one experimental unit under investigation is known as replication.
Replication is necessary in order to get an estimate of the experimental
error variation- cause due to uncontrolled factors. Again, replication
increases the precision of treatments. Replication of treatments helps in
reducing the error in the experiment in addition to providing an estimate
of error.
(ii).Randomization: Assigning treatments or factors to be tested to
experimental units according to definite law of probability is known as
Randomization. In the principle of randomization, every experimental unit
will have the same chance of receiving any one of the treatments under
study. For an objective comparision it is necessary that treatments are
allotted randomly to various experimental units. Statistical procedures
employed in making inferences about treatments hold good only when the
treatments are allotted randomly to various experimental units.
(iii).Local control: Though every experiment should provide an estimate of
error variation, it is not desirable to have a large experimental error. The
reduction of experimental error can be achieved by making use of the fact
that adjacent areas in the field are relatively homogeneous than those
widely separated. The aim of local control is to reduce the error by
suitably modifying the allocation of treatment to the experimental units
by previous knowledge.
Analysis of variance (ANOVA)
Analysis of variance is basically a technique of partitioning the
overall variation in the responses observed in an investigation into
different assignable sources of variation, some of which are specifiable
and others unknown. Further, it helps in testing whether the variation due
Department of Agricultural Statistics, OUAT
Page-47
UG Practical Manual on Statistics
to any particular component is significant as compared to residual
variation that can occur among the observational units.
Some important definition for experimental designs
Treatment: In experimentation, various objects of comparison are known
as treatments. In practice, treatments may refer to a physical substance
(fertilizers/varieties
of
crops/animal
breed/feeds
etc.)
or
a
procedure/condition/methods of cultivation/sowing/housing conditions,
etc. which are applied to experimental units for getting response.
Experimental Unit: The basic objects on which the experiment is done are
known as experimental unit.
Model: In statistics, model is generally expressed in terms of symbols,
usually as a set of equations consisting of factors and treatments with a
random effect.
Fixed effect model: A model in which the factors are fixed effects and the
error affect is random is called a fixed effect model. A fixed effect model
with two factors is written as:
 ijk     i   j  e ijk eijk is i.i.d ~ N (0, e )
2
Random effect model: Models in which factors are random effects and the
error affect is random is called random effect model.
Mixed effect model: Models in which some factors are fixed and some
random with error affect random is called mixed effect model.
Hypothesis: Any assumption or statement about the population
characteristic is called hypothesis. It may be parametric or nonparametric.
Null hypothesis: It is the hypothesis which is tested for possible rejection
under the assumption that it is true.
Degrees of Freedom: The degrees of freedom correspond to the number
of independent deviations or contrasts that are available from the data
and can be calculated by deducting from the number of values available
to the number of constants that are calculated from the data.
Level of significance: This is the probability (under Ho) which leads to the
rejection of the null hypothesis (or rejection region). It is generally
denoted by the symbol  and is usually be 0.05(or 5%) or 0.01(or 1%).
Basic assumptions for analysis of variance:
(i)
All the effects of different sources of variation (e.g treatment,
environment etc.) are additive.
(ii)
Experimental errors are independent.
(iii) Experimental errors have common variance.
(iv) Experimental errors are normally distributed or asymptotic i.e,
i.i.d~N (o,e2)
Department of Agricultural Statistics, OUAT
Page-48
UG Practical Manual on Statistics
Analysis of variance of one-way classified data
Let there be n observation yij, which are grouped into t
classes/treatments such that in the i-th group there are ni observations
i.e.
i=1,2,3….t; j=1,2,3,…,ni and
n
i
n
i
and yij is response due to i-th treatment of j-th unit
Layout:
Treatments
..
i
..
t
1
2
y11
y21
yi1
y t1
y12
y22
yi2
y t2
y2j
yij
ytj
y1n1
y2n2
yini
ytnt
T1
T2
Ti
Tt
..
y1j
..
Total
Mean
Model:
Grand total=G
Grand mean=
yij    ti  eij
where,  is a constant representing the general conditions to which all
the observations are subjected; ti is the unknown effect of the i-th class
to be estimated and eij’ are independent random variables with zero mean
2
and constant variance,  e .
Hypothesis: Under certain additional assumptions, analysis of variance
leads to testing the following hypotheses,
and
for at least one i and j
Analysis:
Step-1. Compute Correction Factor CF= (G 2 n)
Step-2. Compute Total Sum of Square, TSS=  yij  CF
2
i, j
Department of Agricultural Statistics, OUAT
Page-49
UG Practical Manual on Statistics
Step-3. Compute Treatment Sum of Square, TrSS= (
Ti
2
i
ni
)  CF
Step-4. Compute Error Sum of Square, ESS=TSS - TrSS
Step-5. Prepare ANOVA Table
Sources of variation d.f. SS
Treatments
t-1 TrSS
Error
n-t
ESS
Total
n-1
TSS
MSS
TrSS
TMS 
t 1
ESS
EMS 
nt
Fcal
TMS
EMS
F
(tab)
Step-6. Compare F values as:
If Fcal ≤ Ftab at α level then H0 is accepted i.e. all treatment effects are
same or not significant.
If Fcal > Ftab at α level then H1 is accepted i.e. at least two treatment
effects are different or significant.
Step-7. If in ANOVA, the test is not significant which means all the
treatments are equal in giving the effect, then stop further analysis as
result is concluded. But, if the test is significant means at least two
treatments are different for giving the effect, then proceed for comparing
the difference of treatment effects by Critical Difference (CD) or Least
Significant Difference (LSD) test.
CD Test:
i). Estimate SE of i-th treatment mean,
SE (m)  EMS / ni
ii). Estimate SE of the difference between i-th and j-th treatment mean,
1 1
SE (d )  EMS   
n n 
j 
 i
2  EMS
r
iii). Compute CD = SE(d) x t, t=Tabulated t with error d.f. at α level
If ni= nj = r, then
SE (d) =
iv). Compare the difference of any two treatment means (DTM) with the
CD value to find the significant difference between treatments. If any DTM
is less than or equal to CD, then the two are not significant otherwise
significantly different. All such treatment pairs are compared likewise.
Step-8. In order to find out the reliability of the experiment, the
coefficient of variation (CV) is computed as:
EMS
CV 
 100
Overall mean
Department of Agricultural Statistics, OUAT
Page-50
UG Practical Manual on Statistics
If the CV is 20% or less, it is an indication of better precision of the
experiment and when the CV is more than 20% the experiment may be
repeated and efforts made to reduce the experimental error.
Analysis of variance of two-way classified data
Two-way ANOVA is carried out when there are two-way variability
of factors. For example, treatment as first factor and blocking as second
factor in agricultural experiments; feed and housing condition in poultry;
learning process and education standard in social science; tree species
and agro-climatic condition, etc. Let yij be the responses due to
i=1,2,3….t treatments and j=1,2,3,…r blocks in a trial, then
Layout: Let there be t treatments with r blocks or replications for studying
the response of a characteristic, y
Replication
Treatment
t1
t2
..
tt
Total
r1
r2
.. rr
Total Mean
Y11
Y21
..
Yt1
R1
Y12
Y22
..
Yt2
R2
.. Y1r T1
.. Y2r T2
.. ..
..
Ytr Tt
.. Rr G
T1/r
T2/r
..
Tt/r
M=G/rt
Model: The model for two way classified data with one observation per
cell:
yij    ti  b j  eij
Hypothesis: Under certain additional assumptions, analysis of variance
leads to testing the following hypotheses,
and
for at least one i and j
Analysis:
Step-1. Compute Correction Factor CF= (G 2 rt )
Step-2. Compute Total Sum of Square, TSS=  yij  CF
2
i, j
2
Step-3. Compute Treatment Sum of Square, TrSS= (
Ti
Step-4. Compute Replication Sum of Square, RSS= (
Rj
r
i
j
)  CF
2
t
)  CF
Step-5. Compute Error Sum of Square, ESS=TSS – TrSS - RSS
Step-6. Prepare ANOVA Table
Sources of variation
d.f.
Department of Agricultural Statistics, OUAT
SS
MS
Fcal
F
(tab)
Page-51
UG Practical Manual on Statistics
Replication
r-1
RSS
RSS
r 1
TrSS
t-1
TrSS
TMS 
t 1
ESS
(r-1)(t-1) ESS
EMS 
nt
rt-1
TSS
Treatments
Error
Total
RMS 
RMS
EMS
TMS
EMS
Step-7. Compare F values as:
If Fcal ≤ Ftab at α level then H0 is accepted i.e. all treatment effects are
same or not significant.
If Fcal > Ftab at α level then H1 is accepted i.e. at least two treatment
effects are different or significant.
Step-8. If in ANOVA, the test is not significant means all the treatments
are equal in giving the effect, then stop further analysis as result is
concluded. But, if the test is significant means at least two treatments are
different for giving the effect, then proceed for comparing the difference
of treatment effects by Critical Difference (CD) or Least Significant
Difference (LSD) test as above.
Step-9. SE of mean, SE (m)  EMS / r and
SE (diff of 2 means), SE (d )  2 EMS / r
Step-10. CV 
EMS
 100
M
2.2. Analysis of data in completely randomized design (CRD)
The simplest design using only two essential principles of field
experimentation, viz. replication and randomization, is the completely
randomised design (CRD). This is a one-way classification of data. In this
design whole of the experimental units is divided into no. of experimental
units depending on the no. of treatments and no. of replication for each
treatment. The treatments are then allotted randomly to the units of the
entire homogeneous material and observations on different characteristics
or variables of interest are recorded. This design is useful for laboratory
or green house experiments where treatment is the only variable of
interest for comparison.
Procedure:
The analysis is same as that of one-way classification with model,
assumptions, hypothesis and steps of calculation.
Model, Yij =  +ti +eij
Where, Yij is the value of the variate in the jth replicate of the ith
treatment (i=1,2….t; j=1,2…..ri)
Department of Agricultural Statistics, OUAT
Page-52
UG Practical Manual on Statistics
 = is the general mean effect
ti is the effect due to ith treatment
eij is random error which is iid ~ N (0, e2)
Step-1.The observations of a variable y recorded can be arrived as
follows:
Arrangement of observation of CRD
Treatment
1
2
3
Y11
Y21
Y31
Y12
Y22
Y32
Y13
Y23
Y33
------Total
T1
T2
T3
No. of Repl. r1
r2
r3
Treat mean T1  T1 / r1 T2 T2 /r2 T3 T3 /r3
………
………
………
………
-------------
T
Yt1
Yt2
Yt3
---Tt
rt
GT
n
Tt  Tt / rt
Step-2. The testing of hypothesis is,
and
Step-3. Analysis of data
for at least one i and j
(GT) 2
i). Correction Factor (C.F.) =
n
ii). Total Sum of Squares (TSS) =   Y 2ij  C.F.
= (Y 211  Y 212  ....  Y 2 tr )  C.F.
2
T 2 T 2
T 
iii). Treatment Sum of Squares (TrSS) =  1  2  ...  t   C.F.
r2
rt 
 r1
iv). Error Sum of Squares (ESS) = TSS – TrSS
Step-4. Preparation of ANOVA table
Sources of variation d.f. SS
Treatments
t-1 TrSS
Error
n-t
ESS
Total
n-1
TSS
MSS
TrSS
TMS 
t 1
ESS
EMS 
nt
Fcal
TMS
EMS
F
(tab)
Step-5. If the calculated value of F is greater than the table value
of F ; t  1, n  t , where α denotes the level of significance, the hypothesis,
Ho, is rejected and it can be inferred that some or all the treatment effects
are significantly different.
Department of Agricultural Statistics, OUAT
Page-53
UG Practical Manual on Statistics
Step-6. Calculation of standard errors and CD value for pair comparison:
(a).Estimated SE of ith treatment mean, SE (m)  EMS / ri
(b).Estimated SE of the difference between i-th and j-th treatment mean is
1 1
SE (d )  EMS  
r r 
j 
 i
If ri= rj = r, then SE (d) =
2  EMS
r
(c). CD = SE(d) x t
(d).The treatment means are arranged according to their ranks in
descending order. Using the CD value the bar chart is completed to
interpret the treatment comparisons.
CRD with unequal replications
Problem-21. A varietal trial on green gram was conducted in a green
house under CRD having five varieties V1, V2, V3, V4, V5 and replicated
with 3, 4, 5, 4 and 4, respectively. The data recorded on grain yield are
presented below.
Grain yield of green gram (kg/pot)
Varieties
V1
V2
V3
V4
V5
1.6
2.5
1.3
2.0
1.6
1.2
2.2
0.9
1.5
1.0
1.5
2.4
0.8
1.6
0.8
-1.9
1.1
1.4
0.9
--1.0
--Total
4.3
9.0
5.1
6.5
4.3
Repl
3
4
5
4
4
Mean
1.43
2.25
1.02
1.62
1.08
Variance 0.043 0.070 0.037 0.069 0.129
Analyse the data and find the best variety of highest grain yield.
Solution:
Step-1. Null hypothesis Ho: T1=T2 = T2…….= T5 means all varieties give
the same yield;
H1:T1  T2  ….  T5 means all the varieties does not give the same yield
Step-2. Calculation
i). C.F.= (29.2)2/ 20 = 42.6320
ii). TSS=[(1.6) 2+(1.2)2+……….+ (0.9) 2] – C.F. =47.840 – 42.632=5.208
Department of Agricultural Statistics, OUAT
Page-54
UG Practical Manual on Statistics
iii). SS due to treatments (varieties) =TrSS or VSS
 (4.3) 2 (9.0) 2 (5.1) 2 (6.5) 2 (4.3) 2 





  C.F.
4
5
4
4 
 3
 46.8003  42.6320
 4.1683
iv). ESS = TSS - VSS = 5.2080 – 4.1683 = 1.0397
Step-3. Construction of ANOVA table
Sources of variation d.f.
Variety
4
Error
15
Total
19
** Significant
SS
MSS
Fcal
F0.01
4.1683 1.4021 15.037** 4.893
1.0397 0.0693
5.2080
at 1% level
Step-4. Since the observed value F is greater than 1% tabulated F value,
the null hypothesis rejected. It indicates some of the treatment pairs are
different. So, the C.D. test is required for pair wise comparison.
Step-5. Calculation of SE for V1 and V2
1 1
1 1
EMS    0.0693    0.040423  0.2011
3 4
 r1 r2 
The table value of t for  = 0.05 and 15 df is 2.131
Hence, CD= (2.131)  (0.2011) = 0.4285
Similarly CD value of other pairs are:
V1 and V3 = 0.4096,
V1 and V4; V1 and V5 = 0.4285
V2 and V3; V3 and V4; V3 and V5 = 0.3763
V2 and V4; V2 and V5 = 0.3966.
SE(d)=
Comparison of the difference between the mean yields of the varieties
with the corresponding CD value will result in the following bar chart.
V2 V4 V1 V5 V3
Conclusion: It is concluded that the variety V2 is the best variety in giving
highest grain yield followed by V1 & V4 and V3 & V5.
Exercise: The data from a laboratory experiment is used in which
observations were made on mycelial growth of different Rizoctonia
solani isolates on PDA medium as:
R. solani isolates
Mycelial growth
Repl. 1
Repl. 2
Repl. 3
RS-1
29.0
28.0
29.0
RS-2
33.5
31.5
29.0
Department of Agricultural Statistics, OUAT
Page-55
UG Practical Manual on Statistics
RS-3
26.5
30.0
----
RS-4
48.5
46.5
49.0
RS-5
34.5
31.0
----
Analyse the data and draw conclusions on significant difference of
different Rizoctonia solani.
CRD with equal replications
Problem-22. In order to find out the yielding abilities of five varieties of
sesamum, an experiment was conducted in a poly house using a CRD with
four plots per varieties. The observations are given in the table below.
Seed yield of sesamum (g/plot)
Varieties
1
2
3
4
5
25
25
24
20
14
21
28
24
17
15
21
24
16
16
13
18
25
21
19
11
Total
85
102
85
72
53
Mean
21.2 25.5 21.2 18.0 13.2
Analyse the data and draw conclusions on varietal performance of
different sesamum varieties.
Solution:
Step-1. Null hypothesis Ho: V1 = V2 …. = V5, H1: at least 2 varieties are
different.
Step-2. Calculation
(i). C.F. =
397 2
 7880.25
20
(ii). TSS = [(25.)2 + (21)2 +….. (11)2] – C.F. = 8307 – 7880.45 = 426.55
1
(iii). Varieties SS= VSS = (85 2  102 2  ......532 )  C.F.
4
= 8211.75- CF = 8211.75 – 7880.45 = 331.30
(iv). ESS = 426.55 – 331.30 = 95.25
Step-3. Construction of ANOVA table
Sources of variation d.f.
SS
MSS
Fcal
Ftab
Varieties
4 331.30 82.825 13.043 ** 4.893
Error
Total
15 95.25 60350
19 426.55
** Significant at 1% level.
Department of Agricultural Statistics, OUAT
Page-56
UG Practical Manual on Statistics
Step-4. Since the observed value of F is greater than the 5% tabule
value, the null hypothesis rejected. So, we proceed for CD test.
SE(d) =
26.350
 1.7819
4
The table value of t for  = 0.05 and 15 df is 2.131
Hence, CD = (2.131)  (1.7819) = 3.7972 = 3.80
The arrangement of treatments according to their ranks and the bar chart
will be: = V2 V1 V3 V4 V5
Conclusion: From the analysis, it is concluded that the variety V2 is the
best.
Exercise: The data represent a set of observations on wood density
obtained on a randomly collected set of 7 stems belonging to five cane
species.
Species
1
2
3
4
1
0.58
0.53
0.49
0.53
2
0.54
0.63
0.55
0.61
3
0.38
0.68
0.58
0.53
4
0.32
0.55
0.54
0.47
5
0.52
0.45
0.41
0.41
6
0.41
0.59
0.63
0.58
7
0.47
0.65
0.58
0.44
Analyse the data and draw conclusion on difference of
5
0.57
0.64
0.63
0.68
0.61
0.74
0.71
cane species.
2.3. Analysis of data in randomised complete block design (RCBD
or RBD) with one observation per cell
In order to control variability in one direction in the experimental
material it is desirable to divide the experimental unit into homogenous
group of units called blocks perpendicular to treatments. The treatments
are randomly allocated to each of these blocks. This procedure gives an
arrangement of ‘t’ treatments in ‘r’ blocks such that each treatment
occurs precisely once in each block.
Procedure:
Department of Agricultural Statistics, OUAT
Page-57
UG Practical Manual on Statistics
The analysis of a Randomised Complete Block Design is the one
similar to analysis of a two-way classified data. For analysis of this design
we use the linear additive model,
Yij =   t i  r j  eij
Where,  = the overall mean; ti = the ith treatment effect
rj=the jth replication effect, and
eij = the error term iid~ N (0.e2)
Step-1. The observations from a RBD can be arranged as follows:
Arrangement of data in RBD with t treatments and r replications
Treatment
1
2
3
.………..
t
Total
Replication
Total
1
2
3
………….
r
Y11
Y12
Y13
.…………
Y1r
T1
Y21
Y22
Y23
..……….
Y2r
T2
Y31
Y32
Y33
.………..
Y3r
T3
.……….. .……….. .……….. .……….. .……….. .………..
Yt1
Yt2
Yt3
.………..
Ytr
Tt
R1
R2
R3
..………..
Rt
GT
Step-2. The data can be analysed as:
(i). C.F. = (GT)2/rt
(ii). Total SS=TSS =  yij2 – C.F.
I
2
(iii). Replication SS= RSS=  R j  C.F.
t
I
2
(iv). Treatment SS= TrSS =  Ti  C.F.
r
(v). Error SS=ESS = TSS – RSS – TrSS
Step-3. We are interested in testing the hypothesis
Ho: t1 = t2 =. ………= tt, against the alternative that at least 2 t’s are not
equal.
Step-4. ANOVA table
F(tab)
Sources of variation
d.f
SS MSS
Fcal
Replication
r-1
RSS RMS RMS / EMS
Treatment
t-1
TrSS TMS TMS /EMS
Error
(r - 1)(t-1) ESS EMS
Total
rt-1
TSS
Step-5. If F-test shows that there is no significant difference between
replications, it indicates that RBD will not contribute to precision in
detecting treatment differences. In such situations the adoption of RBD in
preference to CRD is not advantageous.
Department of Agricultural Statistics, OUAT
Page-58
UG Practical Manual on Statistics
Step-6. If by F-test we find significant difference between treatments,
then we can use CD for comparing pairs of treatments. The CD is given
by:
CD = tα x SE(d)
Where, tα = table value of t for α (0.01 or 0.05) level of significance and
error degrees of freedom.
2EMS
And SE(d) =
r
Based on the CD value the bar chart can be drawn and conclusions can be
written.
Problem-23. Plan and yield of six paddy strains (A,B,C,D,E,F) yield
(kg/plot) in a RBD experiment with four replications is shown below.
Block-I Block-II Block-III Block-IV
A (12)
B (4)
B (7)
F (8)
E (14)
C (6)
C (9)
A (18)
C (11) E (11)
D (9)
C (10)
D (7)
A (16)
E (15)
E (6)
B (5)
D (8)
F (12)
D (8)
F (10)
F (9)
A (14)
E (12)
(Parentheses figures are yield observations)
Analyse the data and draw conclusions on paddy strains for yield
performance.
Solution:
Step-1. Null hypothesis H0 : TA = TB= ….= TF (All the varieties have the
same mean yield); H1 : At least 2 strains are different
Step-2. The data can be arranged in the following two-way classification.
Paddy yield (in kg/plot)
Replication or Blocks
Treatment
Treatment Total Mean
I
II
III
IV
A
12
16
14
18
60
15
B
5
4
7
6
22
5.5
C
11
6
9
10
36
9
D
7
8
9
8
32
8
E
14
11
15
12
52
13
F
10
9
12
8
39
9.8
Rep. Total 59
54
66
62
GT=241
Step-3. Calculation here, N=r x t = 4 x 6=24
(GT) 2 (241) 2
(i). Correction factor, CF =

 2420
N
24
(ii). Total SS=TSS= (122+……+ 82) – CF= 2717-2420=297
Department of Agricultural Statistics, OUAT
Page-59
UG Practical Manual on Statistics
(iii). Replication or Block SS=RSS =
60
2

 ............  62 2
 CF  2432  2420  12
6
(602  ....  392 )
(iv). Variety SS=VSS =
 CF  2657  2420  237
9
(v). Error SS=ESS= TSS – RSS – VSS = 297-12-237=48
Step-4. Construction of ANOVA Table
Sources of variation
Block
Variety
Error
Total
d.f.
SS
MSS
Ftab
5% 1%
Fcal
(r-1)=3
12
4
1.25ns
(t-1) =5
237 47 14.8** 2.90 4.56
15
48 3.2
(rt-1)=23 297
NS- Not significant
** Significant at 1% level
Step-5. Since the calculated F value of variety is greater than the F table
value for 5 and 15 d.f at 1% level, the conclusion is that the varieties
differ significantly at 1% level or the varietal differences are highly
significant.
2  EMS
3.2
 t0.05 for 15 d . f . 
 2.131  2.69
4
2
Step-7. The arrangement of treatments according to their ranks with
respect to their mean and their bar chart is as follows:
Varieties:
A E F C D B
Conclusion: The Bar chart shows that varieties (A & E) are superior to B &
(C, D,F); while (C,D,F) are at par with respect to yield performance of
these 6 paddy strains.
Step-6. Critical difference, CD =
Exercise: In a field experiment laid out under RCBD, data is made on
seven provenances of Gmelina arborea for the girth at breast-height
(gbh) of the trees attained since 6 years of planting.
gbh (cm) of trees in plots 6 years after planting
Treatment (Provenance)
Replication
I
II
III
1
30.85
38.01
35.10
2
30.24
28.43
35.93
3
30.94
31.64
34.95
4
29.89
29.12
36.75
Department of Agricultural Statistics, OUAT
Page-60
UG Practical Manual on Statistics
5
21.52
24.07
20.76
6
25.38
32.14
32.19
7
22.89
19.66
26.92
Analyse the data and draw conclusions on treatment differences.
2.4. Analysis of data in Latin square design (LSD)
This design controls heterogeneity in two directions in the
experimental material. In this design two restrictions are imposed by
forming blocks in two perpendicular directions, row wise and column wise.
Treatments are allotted in such a way that every treatment occur once
and only once in each row and each column. Thus, a Latin square of ‘t’
treatments is an arrangement of t x t or t2 cells such that every row or
every column contains every treatment precisely once. By this
arrangement the error variation can be considerably reduced further.
Procedure:
For analysis of these designs we use the linear additive model
y ijk    ri  c j  t k  e ijk
Where, yijk is the observation on kth treatment in the ith row and jth column
(i= 1,2,…………..,s, j=1,2,…………,s; k= 1,2,………,s)
 is the general mean effect, ri is the effect due to ith row, cj is the effect
due to jth column, tk is the effect due to kth treatment and eijk is the
random error component which is assumed to be independently and
identically normal distribution with mean zero and a constant variance,
2
e .
Analysis:
Let, there be s treatments arranged in s rows and s columns, then
compute,
(i). Ri= Total of ith row =  y ijk
j
(ii). Cj= Total of jth column =
y
ijk
i
(iii). TK= Total of kth treatment in the design
(iv). C.F.= (GT) 2 s 2 , where GT is Grand Total
(v). TSS (Total Sum of Squares) =
 y
i
(vi). RSS (Row Sum of Squares) =
2
ijk
 C.F.
j
R
2
i
s  C.F.
i
Department of Agricultural Statistics, OUAT
Page-61
UG Practical Manual on Statistics
(vii). CSS (Column Sum of Squares) =
C
2
j
s  C.F.
j
(viii). TrSS (Treatment Sum Squares) =
2
  Tk s  C.F.
k
(ix). ESS (Error Sum of Squares) = TSS- RSS- CSS - TrSS
(x). Hypothesis Ho:t1=t2=……………= ts against H1 that ti’s are not equal
(xi). ANOVA Table
Sources
d.f.
SS
MSS
Fcal
2
Row
(s-1)
RSS
Sr = RSS/ s-1
Column
(s-1)
CSS
Sc2 = CSS/s-1
Treatment
(s-1)
TrSS
St2 = TrSS/s-1
St2/se2
Error
(s-1) (s-2) ESS Se2=ESS/(s-1) (s-2)
Total
(s2-1)
TSS
If the calculated value of F for treatment is greater than the table of
F:(s-1);(s-1)(s-2) d.f., the hypothesis Ho is rejected. We can infer that
the treatment effects are significantly different. To detect the difference,
CD test is performed.
The estimated SE of the difference between ith and jth treatment is
2Se 2
SE (d ) 
s
The critical difference (CD) can be calculated as
CD= SE(d) x t at error df
The degrees of freedom for t are those as for error. The treatment
means are computed as Tk/s (k=1,2,………,s). These means can be
compared with the help of CD value. Any two treatments means are said
to differ significantly if their difference is larger than the CD value.
Problem-24.
An experiment was carried out on Sorghum with 5
varieties (A,B,C,D & E) in a (5  5) LSD. The Plan and grain yield (kg/plot)
are given below:
Rows
I
II
III
IV
V
Column
total
I
B (6)
A (9)
C (3)
E (10)
D (8)
36
Columns
II
III
A (11)
E (8)
D (9)
C (4)
B (8)
D (7)
C (5)
A (10)
E (15)
B (9)
48
38
IV
D (6)
E (14)
A (12)
B (7)
C (3)
V
C (5)
B (10)
E (8)
D (10)
A (18)
42
51
Row total
36
46
38
42
53
215
(Parentheses figures are yield observations of respective treatments)
Perform the ANOVA and compare the variety mean yields.
Department of Agricultural Statistics, OUAT
Page-62
UG Practical Manual on Statistics
Solution:
Step-1. Hypothesis:
H0 : A  B  C  D  E
H1 :  A  .....................   E
Step-2. Yield (kg/plot) of varieties and their totals
A
B C D E
11
6
5
6
8
9
10 4
9 14
12
8
3
7
8
10
7
5 10 10
18
9
3
8 15
Tk 60 40 20 40 55
Variety totals are: A=60, B=40; C=20; D=40; E= 55
Step-3. Calculation
(i). Grand total, GT = 215, Total no. of observations=N=25
(ii). No. of varieties, s = 5
(GT) 2 (215) 2
(iii). Correction factor, C.F. =

 1849
N
25
(iv).
Total
Sum
of
Squares=TSS
2
2
2
2
 y ijk  C.F.  (6  11  ..........  18 )  1849
i
=
j
 2163  1849  314
(v).
Row
Sum
of
2
2
2
2
2
2
R
(36  46  38  42  53
 1849
i si  C.F. 
5
 1885.8  1849  36.8
(vi). Column Sum of Squares=CSS =
2
Cj
(36 2  48 2  38 2  42 2  512 )
 1849
j s  C.F. 
5
Squares=RSS
=
 1881.8  1849  32.8
(vii). Variety Sum of Squares=VrSS
2
Tk
(60 2  40 2  20 2  40 2  55 2 )

C
.
F
.

.
 1849
= 
s
5
k
 2045  1849  196
(viii). Error SS=ESS= TSS- RSS – CSS- VrSS
=314-36.8-32.8-196 = 48.4
Step-4. Construction of ANOVA Table
Source of variation
df
SS
Department of Agricultural Statistics, OUAT
MSS
Fcal
Ftab
Page-63
UG Practical Manual on Statistics
5%
Rows
Columns
Variety
Error
Total
1%
4
36.8
9.2 (9.2/4.03)=2.28 ns
4
32.8
8.2 (8.2/4.03)=2.03 ns
4 196.0 49.0 (49/4.03)=12.15 ** 3.26 5.41
12 48.4 4.03
24 314
Step-5. Comparing the F ratio for Rows, Columns and Varieties with the
table value of F (for 4 and 12 d.f) it is found that only difference in
varietal means are highly significant.
Step-6. CD at 5% = SE(d) x t0.05 for 12 d.f
2  4.03
 2.18  1.26  2.18  2.74
=
5
The arrangement of variety means according to their ranks and the
bar chart will be done by comparing the differences with CD value.
Variety
Means
and the bar coding is:
A
E
12 11
B
8
D
8
C
4
AE BD C
Conclusion: The analysis reveals that the varietal differences is present
and variety A & E are at par; variety B & D are also at par but C is
completely different in giving the yield of the crop. Variety A & E are the
best varieties for yield performance.
Exercise: In a varietal trial on paddy to test the yielding ability of 5
varieties (A,B,C,D,E), an experiment was laid out in a 5x5 LSD. The
results are given below.
Grain yield of paddy (kg/plot)
D 39.0
E 21.2
C 35.6
A 30.8
B 44.3
Analyse the data and
varieties.
A 24.1 E 26.1 B 37.0 C 42.2
B 38.1 A 24.0 C 39.3 D 33.1
E 33.5 B 38.1 D 40.8 A 24.2
C 31.1 D 46.7 E 28.7 B 44.9
D 29.6 C 41.1 A 26.3 E 24.4
draw conclusion on yielding ability of paddy
2.5. Missing plot technique in design of Experiments
Statistical concept: In agricultural field experiments, the experimenter
is often encountered with the situation that the observations of a
particular plot may be lost or are so much affected by some extraneous
causes that it would not be desirable to regard these observations as
normal experimental observations. Such data are generally analysed
Department of Agricultural Statistics, OUAT
Page-64
UG Practical Manual on Statistics
through missing plot technique. Statistical analysis of such type of designs
where observation on one or more plot are missing is somewhat
complicated due to disturbance in the initially symmetrical distribution of
plot among different treatments and also among different blocks. The
analysis of such experiments, however, can be carried out by one of the
following methods.
(a)
(b)
(c)
(d)
Estimating the missing value(s) using the Principle of least squares
i.e. minimizing the error sum of squares.
Method of interaction
Method of fitting constants, and
Analysis of the data with missing observation by the technique of
analysis of covariance.
In the following, we shall use the first method of analysis of data with
one missing observation.
2.6. Analysis of data in RCBD with one missing observation
Procedure:
When any one observation of a character under study is missing, we
first estimate the missing observation and substitute the estimated value
in that place and proceed for analysis. The method consists of selecting a
value ‘x’ for the unknown missing value such that the error variance is
kept at minimum.
Consider a randomized block design with t treatments and r
replications and one observation is missing.
Let, x be the value of the missing observation and this is estimated
as:

rB '  tT '  G '
x
(r  1)(t  1)
where,
B’ = total of available values of the replication that contains the missing
value
T’ = total of available values of the treatment that contains the missing
value
G’ = grand total of all the available values
The analysis is than carried out as usual after substituting the
estimated value of the missing value with the following changes.
i). The d.f. for error and total is corrected by subtracting 1 from the actual
d.f.
ii). Treatment Sum of Squares is to be corrected by subtracting the bias,
( B '  tT '  G ' ) 2
B=
t (t  1)(r  1) 2
Department of Agricultural Statistics, OUAT
Page-65
UG Practical Manual on Statistics
iii). Standard error for testing the significance of the difference between
treatment means:
(a).Standard error of the difference between two treatment means
not involving the missing value:
2Se 2
r
Where, Se2 is the Error Mean Square
SE(d) =
(b).Standard error of the difference between two treatment means
one of which involves the missing value:
SE(d) =

EMS 
t
 2 

r 
(r  1)(t  1) 
Problem-25. To find out the best source of nitrogen at 60 kg/ha, an
experiment was conducted on paddy with 5 sources of nitrogen in 4
blocks. The yield data for different treatments are given below.
Yield of grain (kg/plot)
Blocks
I
II
III
IV
Ammonium
Sulphate
S1
25.4
17.3
22.4
30.5
Ammonium
Chloride
S2
32.5
-28.4
33.4
Urea
S3
37.5
25.4
30.1
34.5
Chilean
nitrate
S4
22.5
14.7
23.5
22.4
Ammonium
Sulphate Nitrate
S5
20.5
21.5
23.5
28.5
The observation relating to application of Ammonium Chloride in the
second block is missing. Estimate the missing value and analyse the data.
Solution:
Step-1. Prepare the following two-way table between treatments and
blocks treating the yield corresponding to S2 in second block as missing.
Treatment X Block table
Treatments
Blocks
1
2
3
4
5
I
25.4 32.5 37.5 22.5 20.5
II
17.3
-25.4 14.7 21.5
III
22.4 28.4 30.1 23.5 23.5
IV
30.5 33.4 34.5 22.4 28.5
Total 95.6 94.3 127.5 83.1 94.0
Total
138.4
78.9
127.9
149.3
494.5
Step-2. Estimate missing value, x,
Department of Agricultural Statistics, OUAT
Page-66
UG Practical Manual on Statistics

x
(4  78.9)  (5  94.3)  494.5
r  B'  t  T '  G'

 24.4
(r  1)(t  1)
3 4
Step-3. Insert the estimated missing value and carryout the analysis of
variance according to the usual procedure of RBD except for subtracting 1
d.f from the d.f. for total S.S as well as from the d.f. for error S.S.
Step-4. Calculation of sum of squares
C.F. =
(GT) 2 (518.9) 2

 13462.86
rt
20
Total S.S=TSS=
 y
2
ij
 C.F.  14124.21  13462.86  658.35
B2 i
Block S.S =BSS= 
 C.F 13694.87  13462.86  232.01
t
i
Treatment S.S.=TrSS = 
j
Tj
r
2
 C.F.  13806.73  13462.86  343.87
Error S.S. =ESS= TSS – BSS – TrSS
= 658.35 – 232.01 – 343.87 = 82.47
While the error mean square is an unbiased estimate of the error
variance, the treatment S.S. is an over estimate and has to be corrected
by subtracting from it a bias, B
B=
( B '  tT `  G ' ) 2
t (t  1)(r  1) 2
(78.9  5  94.3  494.5) 2

 17.36
5 4 9
Corrected Treatment S.S. = 343.87 – 17.36 = 326.51
Step-5. ANOVA Table
Sources
d.f.
SS
MSS
F
Blocks
3 232.01 77.34 10.31 **
Treatments 4 343.87 85.97 11.46 **
Error
11 82.47
7.50
-Total
18 658.35
--Treatments 4 326.51 81.63 8.99 **
(Corrected)
Error
11 99.83
9.03
** Significances at 1% level.
Step-6. Calculation Standard Error
(a). Standard error of the difference between two treatment means not
involving the missing value:
Department of Agricultural Statistics, OUAT
Page-67
UG Practical Manual on Statistics
2  7.50
2Se 2

SE(d) =
 1.936kg / plot
r
4
(b). Standard error of the difference between the two treatment means
one of which has a missing value:
SE(d) =
EMS
r


t
2  r  1t  1   2.13 kg / plot


Exercise: In an experiment under RCBD for comparing fodder yield of 7
sorghum varieties, the data was obtained as:
Fodder yield (t/ha)
Variety
Replication
I
II
III
V1
14.5 14.0 14.0
V2
16.5 16.9 16.7
V3
x
16.7 17.4
V4
17.6 16.9 17.5
V5
18.5 17.9 17.6
V6
19.3 18.3 18.8
V7
19.5 19.0 20.0
Here data on V3 in R-I is missing. Analyse the data and draw your
conclusion.
2.7. Analysis of data in LSD with one missing observation
Procedure:
Step-1. Estimate the missing value, x,
t (R '  C '  T ' )  2G`
x
( t  1)( t  2)

where,
t = no. of treatments
R’ = total of available values of the row containing the missing value
C’ = total of available values of the column containing the missing value
T’ = total of available values of the treatment containing the missing
value
G’ = grand total of all available values
Step-2. The estimated missing value, x, is then inserted and the analysis
is carried out according to the usual procedure for LSD, except, for
Department of Agricultural Statistics, OUAT
Page-68
UG Practical Manual on Statistics
subtracting 1 d.f. from the d.f. for total
S.S. and error S.S. and
computing the corrected treatment S.S. by adjusting the bias, B as
(G '  R '  C '  (t  1)T ' ) 2
((t  1)(t  2))2
Step-3. Standard Error for testing the significance of difference between
two treatment means will be done as follows:
a. SE of the difference between two treatment means not involving
the missing value,
2Se 2
SE (d ) 
t
where, Se2 is the error mean square.
b. SE of the difference between two treatment means one of which
has a missing value,
B=
2

1

SE (d)  Se 2  
 t ( t  1)( t  2) 
Problem-26. The data of grain yield of paddy from a varietal trail in 5 x
5 latin square design is shown in the following table. The yield of variety C
is missing from second row.
Grain yield of paddy (kg/ plot)
E
C
D
B
A
Total
26
42
39
37
24
168
A
D
E
C
B
24
33
21
x
38
166
D
B
A
E
C
47
45
31
29
31
183
B
A
C
D
E
38
24
36
41
34
173
C
E
B
A
D
41
24
44
26
30
165
TOTAL= 176 168 171 133 157 805
Analyse the data and draw your conclusion.
Solution:
Step-1. We first estimate the missing value, x as
 t (R '  C'  T ' )  2G '
5(116  133  150)  2(805) 385
X


 32
( t  1)( t  2)
(5  1)(5  2)
12
Step-2. On substitution of the estimated value in the missing place, we
get the corrected totals as follows:
Total of second row = 148; Total of 4th column= 165
Department of Agricultural Statistics, OUAT
Page-69
UG Practical Manual on Statistics
Total of treatment C = 185; Grand total = 837
Step-3. Calculate the various sum of squares as normal LSD:
CF= (GT)2/t2 = 28022.76
Total SS =TSS= 29399.00 - CF= 1376.24
Row SS =RSS= 28154.20 - CF= 131.44
Column SS=CSS= 28063.00 - CF= 40.24
Treatment SS=TrSS= 28925.00 – CF = 902.24
Error SS=ESS=TSS - RSS - CSS – TrSS = 302.32
Step-4.
Upward
'
'
'
' 2
(G  R  C  ( t  1)T )
[805  116  113  4(150)]2


 13.44
[( t  1)( t  2)]2
(4  3) 2
Corrected treatment SS=TrSS(Adj.) = 902.24-13.44 = 888.80
bias,B
Step-5. Construction of ANOVA Table:
ANOVA Table
Sources of variation d.f.
SS
MS
F
Row
4
131.44
32.86
1.196
Column
4
40.24
10.06
<1
Treatment(Adj.)
4
888.80
222.20 8.085
Error
11 302.32 27.4836
Total
23 1362.80
Step-6. Estimation of Standard errors (SE):
a. SE of the difference between two treatment means not involving the
missing value
2Se 2
2  27.4836

 3.3156
t
5
CD  (2.201)  (3.3156)  7.2956
SE (d ) 
b. SE of the difference between two treatment means one of which
involves the missing value:
2

2

1
1
SE (d )  Se 2  
 27.4836 

  13.2839  3.6447
 5 5  15  2 
 t t  1)t  2 
CD  (2.201)  (3.6447)  8.0220
Step-7. Arrange the variety means in descending order of value and
prepare the bar chart as:
B D C E A
Department of Agricultural Statistics, OUAT
Page-70
UG Practical Manual on Statistics
Conclusion: For yield performance, variety B,D & C are at par and best
followed by both E & A.
Exercise: Estimate the missing value in the following LSD layout having 4
treatments A,B,C & D and analyse the data to draw conclusion.
A
12
C 19
B 10
D 8
C 18
B 12
D 6
A --
B 22
D 10
A 5
C 21
D 12
A 7
C 27
B 17
III. SAMPLING TECHNIQUES
Essentially, sampling consists of obtaining information from only a
part of a large group or population so as to infer about the whole
population. The object of sampling is thus to secure a sample which will
represent the population and reproduce the important characteristics of
the population under study as closely as possible. The principal
advantages of sampling as compared to complete enumeration of the
population are reduced cost, greater speed, greater scope and improved
accuracy. The smaller size of the sample makes the supervision more
effective. Moreover, it is important to note that the precision of the
estimates obtained from certain types of samples can be estimated from
the sample itself. The most ‘convenient’ method of sampling is that in
which the investigator selects a number of sampling units which he
considers ‘representative’ of the whole population
When sampling is performed so that every unit in the population
has some chance of being selected in the sample and the probability of
selection of every unit is known, the method of sampling is called
probability sampling. An example of probability sampling is random
selection, which should be clearly distinguished from haphazard selection,
which implies a strict process of selection equivalent to that of drawing
lots. In this manual, any reference to sampling, unless otherwise stated,
will relate to some form of probability sampling. The probability that any
sampling unit will be selected in the sample depends on the sampling
procedure used. The important point to note is that the precision and
reliability of the estimates obtained from a sample can be evaluated only
Department of Agricultural Statistics, OUAT
Page-71
UG Practical Manual on Statistics
for a probability sample. The object of designing a sample survey is to
minimise the error in the final estimates. Even if the sample is a
probability sample, the sample being based on observations on a part of
the population cannot, in general, exactly represent the population. The
average magnitude of the sampling errors of most of the probability
samples can be estimated from the data collected. The magnitude of the
sampling errors depends on the size of the sample, the variability within
the population and the sampling method adopted. Thus, if a probability
sample is used, it is possible to predetermine the size of the sample
needed to obtain desired and specified degree of precision. A sampling
scheme is determined by the size of sampling units, number of sampling
units to be used, the distribution of the sampling units over the entire
area to be sampled, the type and method of measurement in the selected
units and the statistical procedures for analysing the survey data. A
variety of sampling methods and estimating techniques developed to
meet the varying demands of the survey statistician accord the user a
wide selection for specific situations. One can choose the method or
combination of methods that will yield a desired degree of precision at
minimum cost.
3.1. Principal steps in a sample survey
In any sample survey, we must first decide on the type of data to
be collected and determine how adequate the results should be. Secondly,
we must formulate the sampling plan for each of the characters for which
data are to be collected. We must also know how to combine the sampling
procedures for the various characters so that no duplication of field work
occurs. Thirdly, the field work must be efficiently organised with adequate
provision for supervising the work of the field staff. Lastly, the analysis of
the data collected should be carried out using appropriate statistical
techniques and the report should be drafted giving full details of the basic
assumptions made, the sampling plan and the results of the statistical
analysis.
(i) Specification of the objectives of the survey: Careful consideration
must be given at the outset to the purposes for which the survey is to be
undertaken. The characteristics on which information is to be collected
and the degree of detail to be attempted should be fixed. If it is a survey
of trees, it must be decided as to what species of trees are to be
enumerated, whether only estimation of the number of trees under
specified diameter classes or, in addition, whether the volume of trees is
also proposed to be estimated. It must also be decided at the outset what
accuracy is desired for the estimates.
(ii) Construction of a frame of units: The first requirement of probability
sample of any nature is the establishment of a frame. A frame is a list of
Department of Agricultural Statistics, OUAT
Page-72
UG Practical Manual on Statistics
sampling units which may be unambiguously defined and identified in the
population. The sampling units may be compartments, topographical
sections, strips of a fixed width or plots of a definite shape and size. The
sampling frame is collected from secondary sources like revenue
department or any related offices or books, journals or records etc.
(iii) Choice of a sampling design: If it is agreed that the sampling design
should be such that it should provide a statistically meaningful measure of
the precision of the final estimates, then the sample should be a
probability sample, in that every unit in the population should have a
known probability of being selected in the sample. The choice of units to
be enumerated from the frame of units should be based on some
objective rule which leaves nothing to the opinion of the field worker. The
determination of the number of units to be included in the sample and the
method of selection is also governed by the allowable cost of the survey
and the accuracy in the final estimates.
(iv) Organisation of the field work: The entire success of a sampling
survey depends on the reliability of the field work. Proper selection of the
personnel, intensive training, clear instructions and proper supervision of
the fieldwork are essential to obtain satisfactory results. The field parties
should correctly locate the selected units and record the necessary
measurements according to the specific instruction given. The supervising
staff should check a part of their work in the field and satisfy that the
survey carried out in its entirety as planned.
(v) Analysis of the data: Depending on the sampling design used and the
information collected, proper formulae should be used in obtaining the
estimates and the precision of the estimates should be computed. Double
check of the computations is desired to safeguard accuracy in the
analysis.
(vi) Preliminary survey (pilot trials): The design of a sampling scheme for
a survey requires both knowledge of the statistical theory and experience
with data regarding the nature of the study area, the pattern of variability
and operational cost. If prior knowledge in these matters is not available,
a statistically planned small scale ‘pilot survey’ may have to be conducted
before undertaking any large scale survey in that area. Such exploratory
or pilot surveys will provide adequate knowledge regarding the variability
of the material and will afford opportunities to test and improve field
procedures, train field workers and study the operational efficiency of a
design. A pilot survey will also provide data for estimating the various
components of cost of operations in a survey like time of travel, time of
location and enumeration of sampling units, etc. The above information
will be of great help in deciding the proper type of design and intensity of
sampling that will be appropriate for achieving the objects of the survey.
Department of Agricultural Statistics, OUAT
Page-73
UG Practical Manual on Statistics
Sampling terminology
Population : The word population is defined as the aggregate of units from
which a sample is chosen. Exa. All the plots, trees, plants, insects, blocks,
villages, or people etc. of study area.
Sampling units: Sampling units are all the well defined units of the
population from which a sample is to be collected.
Sampling frame: A list of sampling units of a population of units.
Sample: One or more sampling units selected from a population according
to some specified procedure constitute a sample.
Sampling intensity or sampling fraction: It is the ratio of the number of
units in the sample to the number of units in the population (n/N).
Population total: Suppose a finite population consists of units U1, U2, …,
UN. Let the value of the characteristic for the i-th unit be denoted by yi for
every unit Ui. The population total of the values, yi ( i = 1, 2, …, N) is:
Population mean: The arithmetic mean or average of yi values
Population variance: A measure of the variation between units of the
population is:
which measures the variation among the population units- large values
indicate large variation between units and small values indicate that the
values of the characteristic for the units are close to the population mean.
The square root of the variance is known as standard deviation.
Coefficient of variation: The ratio of the standard deviation to the mean
expressed in percentage is:
C.V . 
Sy
Y
 100
Department of Agricultural Statistics, OUAT
Page-74
UG Practical Manual on Statistics
It being unitless used to compare the variation between two or more
populations or sets of observations for variability.
Parameter: A function of the values of the units in the population. Exa.
Population mean, variance, C.V., correlation etc., are population
parameters. The problem in sampling theory is to estimate the
parameters from a sample by a procedure that makes it possible to
measure the precision of the estimates.
Estimator and estimate: Let the sample observations be y1, y2, …, yn of
size n . Any function of the sample observations will be called a statistic.
When a statistic is used to estimate a population parameter, the statistic
will be called an estimator. Exa. the sample mean is an estimator of the
population mean. Any particular value of an estimator computed from an
observed sample will be called an estimate.
Bias in estimation: A statistic t is said to be an unbiased estimator of a
population parameter q if its expected value, E(t), is equal to q . A
sampling procedure based on a probability scheme gives rise to a number
of possible samples by repetition of the sampling procedure. If the values
of the statistic t are computed for each of the possible samples and if the
average of the values is equal to the population value q , then t is said to
be an unbiased estimator of q. In case E(t) is not equal to q , the
statistic t is said to be a biased estimator of q and the bias is given by,
bias = E(t) - q .
Sampling variance: It is defined as the average magnitude over all
possible samples of the squares of deviations of the estimator from its
expected value and is given by V(t) = E[t - E(t)]2.
The larger the sample and the smaller the variability between units in the
population, the smaller will be the sampling error and the greater will be
the confidence in the results.
Standard error of an estimator: The square root of the sampling variance
of an estimator is known as the standard error of the estimator. The
standard error of an estimate divided by the value of the estimate is
called relative standard error which is usually expressed in percentage.
Accuracy and precision: Precision of an estimate is the inverse of the
standard error or the sampling variance. Accuracy usually refers to the
size of the deviations of the sample estimate from the mean and the bias
thus measured by m - q. It is the accuracy of the sample estimate in
which we are chiefly interested and it is the precision with which we are
able to measure in most instances. We strive to design the survey and
attempt to analyse the data using appropriate statistical methods in such
Department of Agricultural Statistics, OUAT
Page-75
UG Practical Manual on Statistics
a way that the precision is increased to the maximum and bias is reduced
to the minimum.
Confidence limits: If the estimator t is normally distributed (generally
valid for large samples), a confidence interval defined by a lower and
upper limit can be expected to include the population parameter q with a
specified probability level. The limits are given by
Lower limit = t  Z V (t )
Upper limit = t  Z V (t )
Where V(t) is the estimate of the variance of t and Zα is the value of the
normal deviate corresponding to a desired α% confidence probability.
When Zα is taken as 1.96, we say that the chance of the true value of q
being contained in the random interval is 95 per cent.
Some general remarks: Capital letters will be used to denote population
values and small letters to denote sample values. The symbol ‘cap’ (^)
above a symbol for a population value denotes its estimate based on
sample observations. While describing the different sampling methods,
the formulae for estimating only population mean and its sampling
variance are given. Two related parameters are population total and ratio
of the character under study (y) to some auxiliary variable (x). These
related statistics can always be obtained from the mean by using the
following general relations.
where
= Estimate of the population total
N = Total number of units in the population
= Estimate of the population ratio
X = Population total of the auxiliary variable
3.2. Simple random sampling (SRS)
A sampling procedure such that each possible combination of
sampling units out of the population has the same chance of being
selected is referred to as simple random sampling. From theoretical
considerations, simple random sampling is the simplest form of sampling
and is the basis for many other sampling methods. Simple random
sampling is most applicable for the initial survey in an investigation and
for studies which involve sampling from a small area where the sample
size is relatively small. The irregular distribution of the sampling units in
an area in simple random sampling may be of great disadvantage where
Department of Agricultural Statistics, OUAT
Page-76
UG Practical Manual on Statistics
accessibility is poor and the costs of travel and locating the plots are
considerably higher than the cost of enumerating the plot.
Selection of sampling units from a Population
In practice, a random sample is selected unit by unit. Two methods of
random selection for simple random sampling without replacement (WOR)
are explained in this section.
i). Lottery method: The units in the population are numbered 1 to N and
then N identical paper chits with numberings 1 to N are obtained and one
chit is chosen at random after shuffling the chits. The process is
repeated n times without replacing the chits selected. The units which
correspond to the numbers on the chosen chits form a simple random
sample of size n from the population of N units. In this way, the
probability of selecting any chit is the same for all the N chits.
ii). Selection based on random number tables: The procedure of selection
using the lottery method obviously becomes rather inconvenient
when N is large. To overcome this difficulty, we may use a table of
random numbers such as those published by Fisher and Yates a sample of
which is given in Appendix. The tables of random numbers have been
developed in such a way that the digits 0 to 9 appear independent of each
other and approximately equal number of times in the table. The simplest
way of selecting a random sample of required size consists in selecting a
set of n random numbers one by one from 1 to N in the random number
table and then taking the units bearing those numbers. This procedure
may involve a number of rejections since all the numbers more
than N appearing in the table are not considered for selection. In such
cases, the procedure is modified as follows. If N is a d digited number, we
first determine the highest d digited multiple of N, say N’. Then a random
number r is chosen from 1 to N’ and the unit having the serial number
equal to the remainder obtained on dividing r by N is considered as
selected. If remainder is zero, the last unit is selected.
Problem-27: Select a simple random sample of n=5 units from a
population of size N=40.
Solution:
i). Serially number the population units from 1 to 40 (here 40 is 2-digit).
ii). Find the highest two digit number 80 which is divisible by 40.
iii). Let us select the 5th column of random number table (Table-5 of
Appendix).
iv). The value 39 (which is less than N=40) is selected as 1st member of
the sample.
v).Other values of column 92, 90 ate rejected as >80.
Department of Agricultural Statistics, OUAT
Page-77
UG Practical Manual on Statistics
vi). 27 is selected (which is in 1-40) as 2nd sample unit.
vii). 00 i.e 40th value selected as 3rd sample unit.
vii). The next value is 74. Dividing it by 40 the remainder is 34. So 34 th
unit as 4th sample unit.
viii). Next comes 07 and it is selected as 5th sample unit.
So, the selected 5 sample units from the population members of 40
are:39, 27, 40, 34 & 7.
Exercise: Select a random sample of 11 cows from a list 112 milching
cows of a herd by using the random number table.
3.3. Parameter estimation in SRS
a). SRS WOR (without replacement)
Let y1, y2,… ,yn be the measurements on a particular characteristic
on n selected units in a sample from a population of N sampling units. It
can be shown in the case of simple random sampling without replacement
that the sample mean,
is an unbiased estimator of the population mean,
of the sampling variance of
is given by,
. An unbiased estimate
where,
Assuming that the estimate
is normally distributed, a confidence
interval on the population mean can be set with the lower and upper
confidence limits defined by,
Lower limit
and Upper limit
where z is the table value which depends on how many observations
there are in the sample. If there are 30 or more observations we can read
the values from the table of the normal distribution. If there are less than
30 observations, the table value should be read from the table
of t distribution using n - 1 degree of freedom.
Department of Agricultural Statistics, OUAT
Page-78
UG Practical Manual on Statistics
b). SRS WR (with replacement)
Let y1, y2,… ,yn be the measurements on a particular characteristic
on n selected units in a sample from a population of N sampling units with
replacement. Then,
1. Estimate of population mean,
2. Estimate of Variance of sample mean,

V

(Y ) 
N 1 2
Sy
Nn
where

3. Estimate of population total, Y  N  y
4. Estimate of C.I. of population mean:

N 1
Nn

N 1
Nn
Lower limit, YL  y  Z S y
Upper limit, YL  y  Z S y
Problem-28: A forest has been divided up into 1000 plots of 0.1 hectare
each and a simple random sample of 25 plots has been selected. For each
of these sample plots the wood volumes in m3 were recorded as:
Samle Obs.
Wood Volume
Samle Obs.
Wood Volume
1
7
14
11
2
8
15
8
3
2
16
4
4
6
17
7
5
7
18
7
6
10
19
8
7
8
20
7
8
6
21
7
9
7
22
5
10
3
23
8
11
7
24
8
12 13
8
9
25
7
Estimate the population mean, 95% C.I. of mean, C.V. and total volume
of wood in the forest by SRSWOR and SRSWR. Compare the efficiency of
the two methods.
Solution:
Department of Agricultural Statistics, OUAT
Page-79
UG Practical Manual on Statistics
a). SRSWOR
Let the ith sampling unit (i=1,2,3,……,25) of wood volume is designated
as yi.
Now, an unbiased estimator of the population mean is obtained using
formula as:
= 7 m3
which is the mean wood volume per plot of 0.1 ha in the forest area.
An estimate (
formula as:
) of the variance of individual values of y is obtained using
=
= 3.833
Then unbiased estimate of sampling variance of mean is
= 0.1495 and
The relative standard error,
0.3867 m3
C.V.=
=
(100) = 5.52 %
The confidence limits on the population mean are :
Lower limit
= 6.20
Upper limit
= 7.80
The 95% confidence interval for the population mean is (6.20, 7.80) m 3.
Thus, we are 95% confident that the confidence interval (6.20, 7.80)
m3 would include the population mean.
An estimate of the total wood volume in the forest area sampled can
easily be obtained by multiplying the estimate of the mean by the total
number of plots in the population. Thus,
with a confidence interval of (6200, 7800) obtained by
multiplying the confidence limits on the mean by N = 1000.
b). SRSWR
An unbiased estimator of the population mean is also obtained using
formula as:
= 7 m3
which is the mean wood volume per plot of 0.1 ha in the forest area.
Department of Agricultural Statistics, OUAT
Page-80
UG Practical Manual on Statistics
An estimate ( ) of the variance of individual values of y is also obtained
using formula as:
=
= 3.833
Now, the unbiased estimate of sampling variance of mean is


1000  1
V (Y )  1000  25  3.833 =0.153167 and
SE(est. of pop. Mean)=
Sy
N 1
=0.391365 m3
Nn
The relative standard error, C.V.=0.3914x100/7=5.59%
The confidence limits on the population mean are :

N 1
YL  y  Z S y
= 7  2.064  0.3914 =6.19
Nn

YL  y  Z S y
N 1
= 7  2.064  0.3914 =7.81
Nn
Lower limit
= 6.20
Upper limit
= 7.80
The 95% confidence interval for the population mean is (6.19, 7.81) m3.
Thus, we are 95% confident that the confidence interval (6.19, 7.81)
m3 would include the population mean.
An estimate of the total wood volume in the forest area sampled can
easily be obtained by multiplying the estimate of the mean by the total
number of plots in the population. Thus,
with a confidence interval of (61900, 7810) obtained
by multiplying the confidence limits on the mean by N = 1000.
The efficiency of SRSWOR w.r.t SRSWR =(0.1495/0.1531)x100=97.58%
Exercise: In an agriculture survey the following data has been recorded
on holding size of land (in acres) as:
Sl.
No.
1
2
3
4
5
6
7
Holding Sl.
Holding Sl.
Holding
Size
No.
Size
No.
Size
21.04
13
8.29
25
22.13
12.59
14
7.27
26
1.68
20.30
15
1.47
27
49.58
16.16
16
1.12
28
1.68
23.82
17
10.67
29
4.80
1.79
18
5.94
30
12.72
26.91
19
3.15
31
6.31
Department of Agricultural Statistics, OUAT
Page-81
UG Practical Manual on Statistics
8
9
10
11
12
7.41
7.68
66.55
141.80
28.12
20
21
22
23
24
4.84
9.07
3.69
14.61
1.10
32
33
34
35
36
14.18
22.19
2.50
25.29
20.99
Q.1. Draw a random sample of size, n=10 from these 36 observations.
Q.2. Findout the population parameters on mean, variance, total, C.V.,
C.I. of mean at 95% confidence by SRSWOR and SRSWR.
Q.3. Compare the relative precision of SRSWOR with SRSWR.
3.4. Stratified sampling
The basic idea in stratified random sampling is to divide a
heterogeneous population into sub-populations, usually known as strata,
each of which is internally homogeneous in which case a precise estimate
of any stratum mean can be obtained based on a small sample from that
stratum and by combining such estimates, a precise estimate for the
whole population can be obtained. Stratified sampling provides a better
cross section of the population than the procedure of simple random
sampling. It may also simplify the organisation of the field work.
Geographical proximity is sometimes taken as the basis of stratification.
The assumption here is that geographically contiguous areas are often
more alike than areas that are far apart. Administrative convenience may
also dictate the basis on which the stratification is made. A fairly effective
method of stratification is to conduct a quick reconnaissance survey of the
area or pool the information already at hand and stratify the area
according to some characteristics like land, slope, breed, age, plant types,
stand density, site quality etc. If the characteristic under study is known
to be correlated with a supplementary variable for which actual data or at
least good estimates are available for the units in the population, the
stratification may be done using the information on the supplementary
variable. For instance, the rainfall estimates obtained at a previous
inventory of an area may be used for stratification of the population.
In stratified sampling, the variance of the estimator consists of only
the ‘within strata’ variation. Thus the larger the number of strata into
which a population is divided, the higher, in general, the precision, since it
is likely that, in this case, the units within a stratum will be more
homogeneous. For estimating the variance within strata, there should be
a minimum of 2 units in each stratum. The larger the number of strata
the higher will, in general, be the cost of enumeration. So, depending on
administrative convenience, cost of the survey and variability of the
Department of Agricultural Statistics, OUAT
Page-82
UG Practical Manual on Statistics
characteristic under study in the area, a decision on the number of strata
will have to be arrived at.
Allocation and selection of the sample within strata
Let the population is divided into k strata of N1, N2 ,…, Nk units
respectively, and that a sample of n units is to be drawn from the
population. The problem of allocation concerns the choice of the sample
sizes in the respective strata, i.e., how many units should be taken from
each stratum such that the total sample is n.
Other things being equal, a larger sample may be taken from a
stratum with a larger variance so that the variance of the estimates of
strata means gets reduced. The application of the above principle requires
advance estimates of the variation within each stratum. These may be
available from a previous survey or may be based on pilot surveys of a
restricted nature. Thus if this information is available, the sampling
fraction (ni/Ni) in each stratum may be taken proportional to the standard
deviation of each stratum.
In case the cost per unit of conducting the survey in each stratum is
known and is varying from stratum to stratum an efficient method of
allocation for minimum cost will be to take large samples from the
stratum where sampling is cheaper and variability is higher. To apply this
procedure one needs information on variability and cost of observation
per unit in the different strata.
Where information regarding the relative variances within strata and
cost of operations are not available, the allocation in the different strata
may be made in proportion to the number of units in them or the total
area of each stratum. This method is usually known as ‘proportional
allocation’.
For the selection of units within strata, in general, any method
which is based on a probability selection of units can be adopted. But the
selection should be independent in each stratum. If independent random
samples are taken from each stratum, the sampling procedure will be
known as ‘stratified random sampling’. Other modes of selection of
sampling such as systematic sampling can also be adopted within the
different strata.
Estimation of mean and variance
Department of Agricultural Statistics, OUAT
Page-83
UG Practical Manual on Statistics
We shall assume that the population of N units is first divided
into k strata of N1, N2,…,Nk units respectively. These strata are nonoverlapping and together they comprise the whole population, so that
N1 + N2 + ….. + Nk = N
When the strata have been determined, a sample is drawn from
each stratum, the selection being made independently in each stratum.
The sample sizes within the strata are denoted by n1, n2, …,
nk respectively, so that
n1 + n2 +…..+ n3 = n
Let ytj (j = 1, 2,…., Nt ; t = 1, 2,..…k) be the value of the characteristic
under study for the j-th unit in the t-th stratum. Then,
i). the population mean in the t-th stratum is given by
The overall population mean is given by
The estimate of the population mean,
, in this case will be obtained by
Where,
ii). Estimate of the variance of
Department of Agricultural Statistics, OUAT
is given by
Page-84
UG Practical Manual on Statistics
Where,
Stratification, if properly done as explained in the previous sections,
will usually give lower variance for the estimated population total or mean
than a simple random sample of the same size. However, a stratified
sample taken without due care and planning may not be better than a
simple random sample.
Problem-29: A forest area consisting of 69 compartments was divided
into three strata containing compartments 1-29, compartments 30-45,
and compartments 46-69 and sample size of 10, 5 and 8 compartments
respectively were chosen at random from the three strata. The serial
numbers of the selected compartments in each stratum are given in
column (4) of the following Table. The corresponding observed volume of
the particular species in each selected compartment in m3/ha is shown in
column (5).
Table-20. Estimation of parameters under stratified sampling
Stratum
number
(1)
Total number
of units in
the stratum
(Nt)
Number of
units sampled
(nt)
Selected sampling
unit number
(2)
(3)
(4)
(5)
(6)
1
18
28
12
20
19
9
6
17
7
5.40
4.87
4.61
3.26
4.96
4.73
4.39
2.34
4.74
2.85
29.16
23.72
21.25
10.63
24.60
22.37
19.27
5.48
22.47
8.12
..
42.15
187.07
43
42
36
45
39
4.79
4.57
4.89
4.42
3.44
22.94
20.88
23.91
19.54
11.83
..
22.11
99.10
59
7.41
54.91
I
Total
29
10
II
Total
16
Department of Agricultural Statistics, OUAT
5
Volume
(m3/ha)
(
(
)
)
Page-85
UG Practical Manual on Statistics
III
Total
24
8
50
49
58
54
69
52
47
3.70
5.45
7.01
3.83
5.25
4.50
6.51
13.69
29.70
49.14
14.67
27.56
20.25
42.38
..
43.66
252.30
Step-1. Compute the following quantities.
N = (29 + 16 + 24) = 69
n = (10 + 5 + 8) = 23
: y I = 4.215, y II = 4.422, y III = 5.458
Step-2. Estimation of the population mean
Step-3. Estimation of the variance of
and
Department of Agricultural Statistics, OUAT
Page-86
UG Practical Manual on Statistics
Step-3. if we ignore the strata and assume that the same sample of
size n = 23 formed a simple random sample (WOR) from the population
of N = 69, the estimate of the population mean would reduce to
Estimate of the variance of the mean
is,
Where,
so that
=C.V.
The gain in precision due to stratification with SRSWOR is computed by
= 121.8
Thus the gain in precision is 21.8%.
Exercise: 2000 wheat cultivators’ holdings in a GP were stratified
according to their sizes and the results due to stratification is given below.
Stratum No.
No. of holdings
(Ni)
Department of Agricultural Statistics, OUAT
Mean area per
holding ( Yt )
S.D. of area per
holding (St)
Page-87
UG Practical Manual on Statistics
1
2
3
4
5
6
7
394
461
381
334
169
113
148
5.4
16.3
24.3
34.5
42.1
50.1
63.8
8.3
13.3
15.1
19.8
24.5
26.0
35.2
Estimate:
1. Mean of wheat area of the GP
2. Variance of mean of Wheat area of GP
3. C.V. of area of GP
4. Mean area, variance of mean, and C.V. of GP if considered as SRSWOR
5. Gain in precision of stratification with SRSWOR
3.5. Systematic sampling
Systematic sampling employs a simple rule of selecting every k-th
unit starting with a number chosen at random from 1 to k (k=N/n) as the
random start. Let there be N sampling units in the population numbered 1
to N, then a systematic sample of n units are selected starting with the
random start and others with an interval of k (called sampling interval)
from it. This type of sampling is often convenient in exercising control
over field work. Apart from these operational considerations, the
procedure of systematic sampling is observed to provide estimators more
efficient than simple random sampling under normal conditions. The
property of the systematic sample in spreading the sampling units evenly
over the population can be taken advantage of by listing the units so that
homogeneous units are put together or such that the values of the
characteristic for the units are in ascending or descending order of
magnitude i.e. in some order. For example, knowing the fertility trend of
the forest area the units (for example strips) may be listed along the
fertility trend.
Selection of a systematic sample
Consider a population of N=48 units. A sample of n=4 units is
needed. Here, k =(48/4)=12. If the random number selected from the set
of numbers from 1 to 12 is 11, then the units associated with serial
numbers 11, 23, 35 and 47 will be selected. This technique will generate
k systematic samples with equal probability.
In situations where N is not fully divisible by n, k is calculated as
the integer nearest to N/n. In this situation, the sample size is not
Department of Agricultural Statistics, OUAT
Page-88
UG Practical Manual on Statistics
necessarily n and in some cases it may be n -1 and generates unequal
sample sizes.
Parameter estimation
The estimate for the population mean per unit is given by the sample
mean
where n is the number of units in the sample.
One-dimensional Systematic sampling
In the case of systematic strip surveys or, in general, any one
dimensional systematic sampling, an approximation to the standard
error may be obtained from the differences between pairs of successive
units. If there are n units enumerated in the systematic sample, there will
be (n-1) differences. The variance per unit is therefore given by the sum
of squares of the differences divided by twice the number of differences.
Thus if y1, y2,…,yn are the observed values (say volume) for the n units in
the systematic sample and defining the first difference d(yi) as given
below,
; (i = 1, 2, …, n -1)
the approximate variance per unit is estimated as
Problem-30: The following Table gives the observed diameters of 10
trees selected by systematic selection of 1 in 20 trees from a stand
containing 195 trees in rows of 15 trees. The first tree was selected as the
8th tree from one of the outside edges of the stand starting from one
corner and the remaining trees were selected systematically by taking
every 20th tree switching to the nearest tree of the next row after the last
tree in any row is encountered.
Table21. Tree diameter on a systematic sample of 10 trees from a plot
Tree No.
DBH(cm), yi
8
14.8
Department of Agricultural Statistics, OUAT
First difference, d(yi)
Page-89
UG Practical Manual on Statistics
28
12.0
-2.8
48
13.6
+1.6
68
14.2
+0.6
88
11.8
-2.4
108
14.1
+2.3
128
11.6
-2.5
148
9.0
-2.6
168
10.1
+1.1
188
9.5
-0.6
Solution:
Average diameter,
The nine first differences can be obtained as shown in column (3) of the
Table. The error variance of the mean per unit is thus,
= 0.202167
k-Independent Systematic sampling of equal sample size
A difficulty with systematic sampling is that one systematic sample
by itself will not furnish valid assessment of the precision of the
estimates. With a view to have valid estimates of the precision, one may
resort to partially systematic samples. A theoretically valid method of
using the idea of systematic samples and at the same time leading to
unbiased estimates of the sampling error is to draw a minimum of two
systematic samples with independent random starts. If ,
, …,
are m estimates of the population mean based on m independent
systematic samples, the combined estimate for population mean is:
The estimate of the variance of
Department of Agricultural Statistics, OUAT
is given by
Page-90
UG Practical Manual on Statistics
Notice that the precision increases with the number of independent
systematic samples.
Problem-31: The data given in the following Table have one systematic
sample along with another systematic sample selected with independent
random starts. In the second sample, the first tree was selected as the
10th tree.
Table-22. Tree diameter on two independent systematic samples of 10
trees from a plot.
Sample-1
Sample-2
Tree No.
DBH (cm), yi
Tree No.
DBH (cm), yi
8
14.8
10
13.6
28
12.0
30
10.0
48
13.6
50
14.8
68
14.2
70
14.2
88
11.8
90
13.8
108
14.1
110
14.5
128
11.6
130
12.0
148
9.0
150
10.0
168
10.1
170
10.5
188
9.5
190
8.5
Solution:
Here, n=10, k=20 and N=200
The average diameter for the first sample is
second sample is
obtained as:
and the
. Combined estimate of population mean ( ) is
Department of Agricultural Statistics, OUAT
Page-91
UG Practical Manual on Statistics
= 12.13 cm
The estimate of the variance of mean is obtained as:
= 0.0036
= 0.06 cm and C.V.=0.06x100/12.13=0.49%
Total= 200x12.13=2426 cm
Exercise: Given below are data for 10 systematic samples of size 4 from a
population of 40 units.
0
7
18
29
1
2
1
8
18
30
Systematic sample numbers
3
4
5
6
7
2
1
4
5
6
9
10
12
13
15
19
20
21
20
24
31
31
30
32
35
8
7
6
13
37
9
7
16
28
38
10
9
17
29
63
Work out the estimate of population mean, total, variance, C.V. and
relative precision of systematic sample with SRSWOR.
*****************XXX******************
Department of Agricultural Statistics, OUAT
Page-92
UG Practical Manual on Statistics
APPENDIX
STATISTICAL TABLES (t, F, χ2, r, Z, random number)
Table-1(a): Critical values for t-distribution
DF
1
2
3
4
5
Probability %
0.01
0.05
DF
Probability %
0.01
0.05
DF
Probability %
0.01
0.05
63.657 12.706
9.925
4.303
5.841
3.182
4.604
2.776
4.032
2.571
41
42
43
44
45
2.701
2.698
2.695
2.692
2.689
2.020
2.018
2.017
2.016
2.014
81
82
83
84
85
2.637
2.637
2.636
2.635
2.634
1.990
1.989
1.989
1.989
1.988
6
7
8
9
10
3.707
3.499
3.355
3.250
3.169
2.447
2.365
2.306
2.262
2.228
46
47
48
49
50
2.687
2.684
2.682
2.679
2.677
2.013
2.012
2.011
2.010
2.008
86
87
88
89
90
2.634
2.633
2.632
2.632
2.631
1.987
1.987
1.987
1.987
1.987
11
12
13
14
15
3.106
3.055
3.102
2.977
2.947
2.201
2.179
2.160
2.145
2.131
51
52
53
54
55
2.675
2.673
2.671
2.670
2.668
2.007
2.006
2.005
2.004
2.004
91
92
93
94
95
2.630
2.630
2.629
2.629
2.628
1.986
1.986
1.986
1.986
1.986
16
17
18
19
20
2.921
2.898
2.878
2.861
2.845
2.120
2.110
2.101
2.093
2.086
56
57
58
59
60
2.666
2.664
2.663
2.661
2.660
2.003
2.002
2.002
2.001
2.000
96
97
98
99
100
2.628
2.627
2.626
2.626
2.625
1.985
1.985
1.984
1.984
1.984
21
22
23
24
25
2.831
2.819
2.807
2.797
2.787
2.080
2.074
2.069
2.064
2.060
61
62
63
64
65
2.658
2.657
2.656
2.654
2.653
1.999
1.998
1.998
1.997
1.996
105
110
115
120
125
2.623
2.621
2.619
2.617
2.616
1.983
1.982
1.981
1.980
1.979
26
27
28
29
30
2.779
2.771
2.763
2.756
2.750
2.056
2.052
2.048
2.045
2.042
66
67
68
69
70
2.652
2.651
2.650
2.649
2.647
1.996
1.995
1.995
1.994
1.994
130
135
140
145
150
2.614
2.613
2.611
2.610
2.609
1.978
1.978
1.977
1.976
1.976
31
32
33
34
35
2.744
2.738
2.733
2.728
2.723
2.040
2.037
2.035
2.033
2.030
71
72
73
74
75
2.646
2.645
2.644
2.643
2.643
1.993
1.993
1.993
1.993
1.992
160
170
180
190
200
2.607
2.605
2.603
2.602
2.601
1.975
1.974
1.973
1.973
1.972
36
37
38
39
40
2.719
2.715
2.711
2.707
2.704
2.028
2.026
2.024
2.022
2.021
76
77
78
79
80
2.642
2.641
2.640
2.639
2.638
1.992
1.991
1.991
1.991
1.990
250
300
350
400

2.596
2.592
2.590
2.588
2.576
1.969
1.968
1.967
1.966
1.960
Table-1(b): Critical values for t-distribution (One & Two-tailed)
Department of Agricultural Statistics, OUAT
Page-93
UG Practical Manual on Statistics
Percentage (P)
One-tailed
Two-tailed
Degree of freedom (v)
5%
1%
5%
1%
1
6.31
31.8
12.7
63.7
2
2.92
6.96
4.30
9.92
3
2.35
4.54
3.18
5.84
4
2.13
3.75
2.78
4.60
5
2.02
3.36
2.57
4.03
6
1.94
3.14
2.45
3.71
7
1.89
3.00
2.36
3.50
8
1.86
2.90
2.31
3.36
9
1.83
2.82
2.26
3.25
10
1.81
2.76
2.23
3.17
11
1.80
2.72
2.20
3.11
12
1.78
2.68
2.18
3.05
13
1.77
2.65
2.16
3.01
14
1.76
2.62
2.14
2.98
15
1.75
2.60
2.13
2.95
16
1.75
2.58
2.12
2.92
17
1.74
2.57
2.11
2.90
18
1.73
2.55
2.10
2.88
19
1.73
2.44
2.09
2.86
20
1.72
2.53
2.09
2.85
22
1.72
2.51
2.07
2.82
24
1.72
2.49
2.06
2.80
26
1.71
2.48
2.06
2.78
28
1.70
2.47
2.05
2.76
30
1.70
2.46
2.04
2.75
35
1.69
2.44
2.03
2.72
40
1.68
2.42
2.02
2.70
45
1.68
2.41
2.01
2.69
Department of Agricultural Statistics, OUAT
Page-94
UG Practical Manual on Statistics
50
1.68
2.40
2.01
2.68
55
1.67
2.40
2.00
2.67
60
1.67
2.39
2.00
2.66
¥
1.64
2.33
1.96
2.58
Table-2: Critical values for F-distribution
Smaller MS
(n2)
Degrees of freedom for greater mean square (n1)
1
2
3
4
5
6
7
8
9
10
1
5% 161.00 200.00 216.00 225.00 230.00 234.00 237.00 239.00 241.00 242.00
1% 4052.00 4999.00 5403.00 5625.00 5764.00 5859.00 5928.00 5981.00 6022.00 6056.00
2
5%
1%
18.51 19.00 19.16 19.25 19.30 19.33 19.36 19.37 19.38 19.39
98.49 99.00 99.17 99.25 99.30 99.33 99.36 99.37 99.39 99.40
3
5%
1%
10.13 9.55 9.28 9.12 9.01 8.94 8.88 8.84 8.81 8.78
34.12 30.82 29.46 28.71 28.24 27.91 27.67 27.49 27.34 27.23
4
5%
1%
7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96
21.20 18.00 16.69 15.98 15.52 15.21 14.98 14.80 14.66 14.54
5
5%
1%
6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.78 4.74
16.26 13.27 12.06 11.39 10.97 10.67 10.45 10.29 10.15 10.05
6
5%
1%
5.99 5.14
13.74 10.92
4.76
9.78
4.53
9.15
4.39
8.75
4.28
8.47
4.21
8.26
4.15
8.10
4.10
7.98
4.06
7.87
7
5%
1%
5.59
12.25
4.74
9.55
4.35
8.45
4.12
7.85
3.97
7.46
3.87
7.19
3.79
7.00
3.73
6.84
3.68
6.71
3.63
6.62
8
5%
1%
5.32
11.26
4.46
8.65
4.07
7.59
3.84
7.01
3.69
6.63
3.58
6.37
3.50
6.19
3.44
6.03
3.39
5.91
3.34
5.82
9
5%
1%
5.12
10.56
4.26
8.02
3.86
6.99
3.63
6.42
3.48
6.06
3.37
5.80
3.29
5.62
3.23
5.47
3.18
5.35
3.13
5.26
10 5%
4.96
10.04
4.10
7.56
3.71
6.55
3.48
5.99
3.33
5.64
3.22
5.39
3.14
5.21
3.07
5.06
3.02
4.95
2.97
4.85
4.84
9.65
3.98
7.20
3.59
6.22
3.36
5.67
3.20
5.32
3.09
5.07
3.01
4.88
2.95
4.74
2.90
4.63
2.80
4.54
4.75
9.33
3.88
6.93
3.49
5.95
3.26
5.41
3.11
5.06
3.00
4.82
2.92
4.65
2.85
4.50
2.80
4.39
2.76
4.30
4.67
9.07
3.80
6.70
3.41
5.74
3.18
5.20
3.02
4.86
2.92
4.62
2.84
4.44
2.77
4.30
2.72
4.19
2.67
4.10
4.60
8.86
3.74
6.51
3.34
5.56
3.11
5.03
2.96
4.69
2.85
4.46
2.77
4.28
2.70
4.14
2.65
4.03
2.60
3.94
4.54
8.68
3.68
6.36
3.29
5.42
3.06
4.89
2.90
4.56
2.79
4.32
2.70
4.14
2.64
4.00
2.59
3.89
2.55
3.80
4.49
8.53
3.83
6.23
3.24
5.29
3.01
4.77
2.85
4.44
2.74
4.20
2.66
4.03
2.59
3.89
2.54
3.78
2.49
3.69
4.45
8.40
3.59
6.11
3.20
5.18
2.96
4.67
2.81
4.34
2.70
4.10
2.62
3.93
2.55
3.79
2.50
3.68
2.45
3.59
1%
11 5%
1%
12 5%
1%
13 5%
1%
14 5%
1%
15 5%
1%
16 5%
1%
17 5%
1%
Table-2 (Continued…)
Critical values for F-distribution
Department of Agricultural Statistics, OUAT
Page-95
UG Practical Manual on Statistics
Smaller
MS
Degrees of freedom for greater mean square (n1)
(n2)
1
18 5% 4.41
1% 8.28
2
3.55
6.01
3
3.16
5.09
4
2.93
4.58
5
2.77
4.25
6
2.66
4.01
7
2.58
3.85
8
2.51
3.71
9
2.46
3.60
10
2.41
3.51
19 5% 4.38
1% 8.18
3.52
5.93
3.13
5.01
2.90
4.50
2.74
4.17
2.63
3.94
2.55
3.77
2.48
3.63
2.43
3.52
2.38
3.43
20 5% 4.35
1% 8.10
3.49
5.85
3.10
4.94
2.87
4.43
2.71
4.10
2.60
3.87
2.52
3.71
2.45
3.56
2.40
3.45
2.35
3.37
21 5% 4.32
1% 8.02
3.47
5.78
3.07
4.87
2.84
4.37
2.68
4.04
2.57
3.81
2.49
3.65
2.42
3.51
2.37
3.40
2.32
3.31
22
4.30
7.94
3.44
5.72
3.05
4.82
2.82
4.31
2.66
3.99
2.55
3.76
2.47
3.59
2.40
3.45
2.35
3.35
2.30
3.26
23 5% 4.28
1% 7.88
3.42
5.66
3.03
4.76
2.80
4.26
2.64
3.94
2.53
3.71
2.45
3.54
2.38
3.41
2.32
3.30
2.28
3.21
24
4.26
7.82
3.40
5.61
3.01
4.72
2.78
4.22
2.62
3.90
2.51
3.67
2.43
3.50
2.36
3.36
2.30
3.25
2.26
3.17
4.24
7.77
3.38
5.57
2.99
4.68
2.76
4.18
2.60
3.86
2.49
3.63
2.41
3.46
2.34
3.32
2.28
3.21
2.24
3.13
26 5% 4.22
1% 7.72
3.37
5.53
2.98
4.64
2.74
4.14
2.59
3.82
2.47
3.59
2.39
3.42
2.32
3.29
2.28
3.17
2.22
3.09
27 5% 4.21
1% 7.68
3.50
5.49
2.96
4.60
2.73
4.11
2.57
3.79
2.46
3.56
2.37
3.39
2.30
3.26
2.25
3.14
2.20
3.06
28 5% 4.20
1% 7.64
3.34
5.45
2.95
4.57
2.71
4.07
2.56
3.76
2.44
3.53
2.36
3.36
2.29
3.23
2.24
3.11
2.19
3.03
29 5% 4.18
1% 7.60
3.33
5.42
2.95
4.54
2.70
4.04
2.54
3.73
2.43
3.50
2.35
3.33
2.28
3.20
2.22
3.08
2.18
3.00
30 5% 4.17
1% 7.56
3.32
5.39
2.92
4.51
2.69
4.02
2.53
3.70
2.42
3.47
2.34
3.30
2.27
3.17
2.21
3.06
2.16
2.98
31 5% 4.16
1% 7.53
3.31
5.37
2.91
4.49
2.68
4.00
2.52
3.68
2.41
3.45
2.33
3.28
2.26
3.15
2.20
3.04
2.15
2.96
32 5% 4.15
1% 7.50
3.30
5.34
2.90
4.46
2.67
3.97
2.51
3.66
2.40
3.42
2.32
3.25
2.25
3.12
2.19
3.01
2.14
2.94
33 5% 4.14
1% 7.47
3.29
5.32
2.89
4.44
2.66
3.95
2.50
3.64
2.39
3.40
2.31
3.23
2.24
3.10
2.18
2.99
2.13
2.92
34 5% 4.13
1% 7.44
3.28
5.29
2.88
4.42
2.65
3.93
2.49
3.61
2.38
3.38
2.30
3.21
2.23
3.08
2.17
2.97
2.12
2.89
5%
1%
5%
1%
25
5%
1%
Table-2 (Continued…)
Critical values for F-distribution
Smaller
MS
Degrees of freedom for greater mean square (n1)
(n2)
1
35 5% 4.12
1% 7.42
2
3.27
5.27
3
2.87
4.40
4
2.64
3.91
5
2.49
3.60
6
2.37
3.37
7
2.29
3.20
8
2.22
3.06
9
2.16
2.96
10
2.11
2.88
36 5% 4.11
1% 7.39
3.26
5.25
2.86
4.38
2.63
3.89
2.48
3.58
2.36
3.35
2.28
3.18
2.21
3.04
2.15
2.94
2.10
2.86
37 5% 4.11
1% 7.37
3.26
5.23
2.86
4.36
2.63
3.88
2.47
3.56
2.36
3.34
2.27
3.17
2.20
3.03
2.15
2.93
2.10
2.84
38
39
1%
4.10
7.35
3.25
5.21
2.85
4.34
2.62
3.86
2.46
3.54
2.35
3.32
2.26
3.15
2.19
3.02
2.14
2.91
2.09
2.82
5%
4.09
3.24
2.85
2.62
2.46
2.35
2.26
2.19
2.13
2.08
5%
Department of Agricultural Statistics, OUAT
Page-96
UG Practical Manual on Statistics
1%
7.33
5.20
4.33
3.85
3.53
3.31
3.14
3.01
2.90
2.81
40 5% 4.08
1% 7.31
3.23
5.18
2.84
4.31
2.61
3.83
2.45
3.51
2.34
3.29
2.25
3.12
2.18
2.99
2.12
2.88
2.07
2.80
41 5% 4.08
1% 7.29
3.23
5.17
2.84
4.30
2.61
3.82
2.45
3.50
2.33
3.28
2.25
3.11
2.18
2.98
2.12
2.87
2.07
2.79
42 5% 4.07
1% 7.27
3.22
5.15
2.83
4.29
2.60
3.80
2.44
3.49
2.32
3.26
2.24
3.10
2.17
2.96
2.11
2.86
2.06
2.77
43 5% 4.07
1% 7.26
3.22
5.14
2.83
4.28
2.60
3.79
2.44
3.48
2.32
3.25
2.24
3.09
2.17
2.95
2.11
2.85
2.06
2.76
44
4.06
7.24
3.21
5.12
2.82
4.26
2.59
3.78
2.43
3.46
2.31
3.24
2.23
3.07
2.16
2.94
2.10
2.84
2.05
2.75
4.06
7.23
3.21
5.11
2.82
4.25
2.59
3.77
2.43
3.45
2.31
3.23
2.23
3.06
2.15
2.93
2.10
2.83
2.05
2.74
4.05
7.21
3.20
5.10
2.81
4.24
2.58
3.76
2.42
3.44
2.30
3.22
2.22
3.05
2.14
2.92
2.09
2.82
2.04
2.73
4.05
7.20
3.20
5.09
2.81
4.23
2.58
3.75
2.42
3.43
2.30
3.21
2.22
3.05
2.14
2.91
2.09
2.81
2.04
2.72
4.04
7.19
3.19
5.08
2.80
4.22
2.57
3.74
2.41
3.42
2.30
3.20
2.21
3.04
2.14
2.90
2.08
2.80
2.03
2.71
4.04
7.18
3.19
5.07
2.80
4.21
2.57
3.73
2.41
3.42
2.30
3.19
2.21
3.03
2.14
2.89
2.08
2.79
2.03
2.71
4.03
7.17
3.18
5.06
2.79
4.20
2.56
3.72
2.40
3.41
2.29
3.18
2.20
3.02
2.13
2.88
2.07
2.78
2.02
2.70
5%
1%
45
5%
1%
46
5%
1%
47
5%
1%
48
5%
1%
49
5%
1%
50
5%
1%
Table-2 (Continued…)
Critical values for F-distribution
Smaller
MS
Degrees of freedom for greater mean square (n1)
(n2)
1
55 5% 4.02
1% 7.12
2
3.17
5.01
3
2.78
4.16
4
2.54
3.68
5
2.38
3.37
6
2.27
3.15
7
2.18
2.98
8
2.11
2.85
9
2.05
2.75
10
2.00
2.66
60
5%
1%
4.00
7.08
3.15
4.98
2.76
4.13
2.52
3.65
2.37
3.34
2.25
3.12
2.17
2.95
2.10
2.82
2.04
2.72
1.99
2.63
65
5%
1%
3.99
7.04
3.14
4.95
2.75
4.10
2.51
3.62
2.36
3.31
2.24
3.09
2.15
2.93
2.08
2.79
2.02
2.70
1.98
2.61
70
5%
1%
3.98
7.01
3.13
4.92
2.74
4.08
2.50
3.60
2.35
3.29
2.23
3.07
2.14
2.91
2.07
2.77
2.01
2.67
1.97
2.59
80
5%
1%
3.96
6.96
3.11
4.88
2.72
4.04
2.48
3.56
2.33
3.25
2.21
3.04
2.12
2.87
2.05
2.74
1.99
2.64
1.95
2.55
100 5% 3.94
1% 6.90
3.09
4.82
2.70
3.98
2.46
3.51
2.30
3.20
2.19
2.99
2.10
2.82
2.03
2.69
1.97
2.59
1.92
2.51
125 5% 3.92
1% 6.84
3.07
4.78
2.68
3.94
2.44
3.47
2.29
3.17
2.17
2.95
2.08
2.79
2.01
2.65
1.95
2.56
1.90
2.47
150 5% 3.91
1% 6.81
3.06
4.75
2.67
3.91
2.43
3.44
2.27
3.14
2.16
2.92
2.07
2.76
2.00
2.62
1.94
2.53
1.89
2.44
200 5% 3.89
1% 6.76
3.04
4.71
2.65
3.88
2.41
3.41
2.26
3.11
2.14
2.90
2.05
2.73
1.98
2.60
1.92
2.50
1.87
2.41
400 5% 3.86
1% 6.70
3.02
4.66
2.62
3.83
2.39
3.36
2.23
3.06
2.12
2.85
2.03
2.69
1.96
2.55
1.90
2.46
1.85
2.37
10005% 3.85
1% 6.66
3.00
4.62
2.61
3.80
2.38
3.34
2.22
3.04
2.10
2.82
2.02
2.66
1.95
2.53
1.89
2.43
1.84
2.34
Department of Agricultural Statistics, OUAT
Page-97
UG Practical Manual on Statistics

5%
1%
3.84
2.99
2.60
2.37
2.21
2.09
2.01
1.94
1.88
1.83
6.64
4.60
3.78
3.32
3.02
2.80
2.64
2.51
2.41
2.32
Table-2 (Continued…)
Critical values for F-distribution
Smaller
MS
(n2)
Degrees of freedom for greater mean square (n1)
11
12
13
14
15
16
17
18
19
20
1
5% 243.00 244.00 244.50 245.00 245.50 246.00 246.50 247.00 247.50 248.00
1% 6082.00 6106.00 6124.00 6142.00 6156.00 6169.00 6177.00 6186.00 6194.00 6208.00
2
5%
1%
19.40 19.41 19.42 19.42 19.43 19.43 19.43 19.44 19.44 19.44
99.41 99.42 99.42 99.43 99.43 99.44 99.44 99.45 99.45 99.45
3
5%
1%
8.76 8.74 8.73 8.71 8.70 8.69 8.68 8.68 8.67 8.66
27.13 27.05 26.99 26.92 26.88 26.83 26.80 26.76 26.73 26.69
4
5%
1%
5.93 5.91 5.89 5.87 8.86 5.84 5.83 5.82 5.81 5.80
14.45 14.37 14.31 14.24 14.20 14.15 14.11 14.07 14.04 14.02
5
5%
1%
4.70
9.96
4.68
9.89
4.66
9.81
4.64
9.77
4.62
9.73
4.60
9.68
4.59
9.65
4.58
9.62
4.57
9.58
4.56
9.55
5%
1%
4.03
7.79
4.00
7.72
3.98
7.66
3.96
7.60
3.94
7.56
3.92
7.52
3.91
7.79
3.90
7.46
3.88
7.42
3.87
7.39
7
5%
1%
3.60
6.54
3.57
6.47
3.55
6.41
3.52
6.35
3.51
6.31
3.49
6.27
3.48
6.24
3.47
6.21
3.45
6.18
3.44
6.15
8
5%
1%
3.31
5.74
3.28
5.67
3.26
5.62
3.23
5.56
3.22
5.52
3.20
5.48
3.19
5.46
3.18
5.42
3.16
5.39
3.15
5.36
9
5%
1%
3.10
5.18
3.07
5.11
3.05
5.06
3.02
5.00
3.00
4.96
2.98
4.92
2.97
4.89
2.96
4.86
2.94
4.83
2.93
4.80
10
5%
1%
2.94
4.78
2.91
4.71
2.89
5.66
2.86
4.60
2.84
4.56
2.82
4.52
2.81
4.49
2.80
4.47
2.78
4.44
2.77
4.41
11
5%
1%
2.82
4.46
2.79
4.40
2.77
4.35
2.74
4.29
2.72
4.25
2.70
4.21
2.69
4.18
2.68
4.16
2.66
4.13
2.65
4.10
12
5%
1%
2.72
4.22
2.69
4.16
2.67
4.11
2.64
4.05
2.62
4.02
2.60
3.98
2.59
3.95
2.57
3.92
2.56
3.89
2.54
3.86
13
5%
1%
2.63
4.02
2.60
3.96
2.58
3.92
2.55
3.85
2.53
3.82
2.51
3.78
2.50
3.75
2.49
3.73
2.73
3.70
2.46
3.67
14
5%
1%
2.56
3.86
2.53
3.80
2.51
3.75
2.48
3.70
2.46
3.66
2.44
3.62
2.43
3.59
2.42
3.57
2.40
3.54
2.39
3.51
15
5%
1%
2.51
3.73
2.48
3.67
2.46
3.66
2.43
3.56
2.41
3.52
2.39
3.48
2.38
3.45
2.36
3.42
2.35
3.39
2.33
3.36
16
5%
1%
2.45
3.61
2.42
3.55
2.40
3.50
2.37
3.45
2.35
3.41
2.33
3.37
2.32
3.34
2.31
3.31
2.29
3.28
2.28
3.25
17
5%
1%
2.41
3.52
2.38
3.45
2.36
3.40
2.33
3.35
2.31
3.31
2.29
3.27
2.28
3.24
2.26
3.22
2.25
3.19
2.23
3.16
6
Table-2 (Continued…)
Critical values for F-distribution
Department of Agricultural Statistics, OUAT
Page-98
UG Practical Manual on Statistics
Smaller
MS
Degrees of freedom for greater mean square (n1)
(n2)
11
18 5% 2.37
1% 3.44
12
2.34
3.37
13
2.32
3.32
14
2.29
3.27
15
2.27
3.23
16
2.25
3.19
17
2.24
3.16
18
2.22
3.13
19
2.21
3.10
20
2.19
3.07
19
5%
1%
2.34
3.36
2.31
3.30
2.29
3.25
2.26
3.19
2.24
3.16
2.21
3.12
2.20
3.09
2.18
3.06
2.17
3.03
2.15
3.00
20
5%
1%
2.31
3.30
2.28
3.23
2.26
3.18
2.23
3.13
2.21
3.09
2.18
3.05
2.17
3.02
2.15
3.00
2.14
2.97
2.12
2.94
21
5%
1%
2.28
3.24
2.25
3.17
2.23
3.12
2.20
3.07
2.18
3.03
2.15
2.99
2.14
2.96
2.12
2.94
2.12
2.91
2.09
2.88
22
5%
1%
2.26
3.18
2.23
3.12
2.21
3.07
2.18
3.02
2.16
2.98
2.13
2.94
2.12
2.91
2.10
2.89
2.09
2.86
2.07
2.83
23 5% 2.24
1% 3.14
2.20
3.07
2.17
3.02
2.14
2.97
2.12
2.93
2.10
2.89
2.09
2.86
2.07
2.84
2.06
2.81
2.04
2.78
24
5%
1%
2.22
3.09
2.18
3.03
2.16
2.98
2.13
2.93
2.11
2.89
2.09
2.85
2.07
2.82
2.06
2.80
2.04
2.87
2.02
2.74
25
5%
1%
2.20
3.05
2.16
2.99
2.14
2.94
2.11
2.89
2.09
2.85
2.07
2.81
2.05
2.78
2.04
2.76
2.02
2.73
2.00
2.70
26
5%
1%
2.18
3.02
2.15
2.96
2.13
2.91
2.10
2.86
2.08
2.82
2.05
2.77
2.04
2.74
2.02
2.72
2.01
2.69
1.99
2.66
27
5%
1%
2.16
2.98
2.13
2.93
2.11
2.88
2.08
2.83
2.06
2.79
2.03
2.74
2.02
2.71
2.00
2.69
1.99
2.66
1.97
2.63
28
5%
1%
2.15
2.95
2.12
2.90
2.09
2.85
2.06
2.80
2.04
2.76
2.02
2.71
2.01
2.68
1.99
2.66
1.98
2.63
1.96
2.60
29
5%
1%
2.14
2.92
2.10
2.87
2.08
2.82
2.05
2.77
2.03
2.73
2.00
2.68
1.99
2.65
1.97
2.63
1.96
2.60
1.94
2.57
30
5%
1%
2.12
2.90
2.09
2.84
2.05
2.79
2.04
2.74
2.02
2.70
1.99
2.66
1.98
2.63
1.96
2.61
1.95
2.58
1.93
2.55
31
5%
1%
2.11
2.88
2.08
2.82
2.05
2.77
2.03
2.72
2.01
2.68
1.98
2.64
1.97
2.61
1.95
2.59
1.94
2.56
1.92
2.53
32
5%
1%
2.10
2.86
2.07
2.80
2.05
2.75
2.02
2.70
2.00
2.66
1.97
2.62
1.96
2.59
1.94
2.57
1.93
2.54
1.91
2.51
33 5% 2.09
1% 2.84
2.06
2.78
2.04
2.73
2.01
2.68
1.99
2.64
1.96
2.60
1.95
2.57
1.93
2.55
1.92
2.52
1.90
2.49
34
2.05
2.76
2.03
2.71
2.00
2.66
1.98
2.62
1.95
2.58
1.94
2.55
1.92
2.53
1.91
2.50
1.89
2.47
5%
1%
2.08
2.82
Table-2 (Continued…)
Critical values for F-distribution
Smaller
MS
Degrees of freedom for greater mean square (n1)
(n2)
11
35 5% 2.07
1% 2.80
12
2.04
2.74
13
2.02
2.69
14
1.99
2.64
15
1.97
2.60
16
1.94
2.56
17
1.93
2.53
18
1.91
2.51
19
1.90
2.48
20
1.88
2.45
36
5%
1%
2.06
2.78
2.03
2.72
2.01
2.67
1.98
2.62
1.96
2.58
1.93
2.54
1.92
2.51
1.90
2.49
1.89
2.46
1.87
2.43
37
5%
1%
2.06
2.77
2.03
2.71
2.00
2.66
1.97
2.61
1.95
2.57
1.93
2.53
1.91
2.50
1.89
2.47
1.88
2.44
1.86
2.41
38
5%
1%
2.05
2.75
2.02
2.69
1.99
2.64
1.96
2.59
1.94
2.55
1.92
2.51
1.90
2.48
1.89
2.46
1.87
2.43
1.85
2.40
39
5%
1%
2.05
2.74
2.01
2.68
1.99
2.63
1.96
2.58
1.93
2.54
1.91
2.50
1.89
2.48
1.88
2.45
1.86
2.42
1.85
2.38
Department of Agricultural Statistics, OUAT
Page-99
UG Practical Manual on Statistics
40
5%
1%
2.04
2.73
2.00
2.66
1.98
2.61
1.95
2.56
1.93
2.53
1.90
2.49
1.89
2.46
1.87
2.43
1.86
2.40
1.84
2.37
41
5%
1%
2.01
2.72
2.00
2.65
1.98
2.60
1.95
2.55
1.92
2.51
1.90
2.48
1.88
2.45
1.86
2.42
1.85
2.39
1.83
2.36
42
5%
1%
2.02
2.70
1.99
2.64
1.97
2.59
1.94
2.54
1.92
2.50
1.89
2.46
1.87
2.43
1.86
2.41
1.84
2.38
1.82
2.35
43
5%
1%
2.02
2.69
1.99
2.63
1.96
2.58
1.93
2.53
1.91
2.49
1.89
2.45
1.87
2.42
1.85
2.39
1.83
2.36
1.82
2.33
44
5%
1%
2.01
2.68
1.98
2.62
1.95
2.57
1.92
2.52
1.90
2.48
1.88
2.44
1.86
2.41
1.85
2.38
1.83
2.35
1.81
2.32
45
5%
1%
2.01
2.67
1.98
2.61
1.95
2.56
1.92
2.51
1.90
2.47
1.88
2.43
1.86
2.40
1.84
2.37
1.82
2.34
1.81
2.31
46
5%
1%
2.00
2.66
1.97
2.60
1.94
2.55
1.91
2.50
1.89
2.46
1.87
2.42
1.84
2.39
1.84
2.36
1.82
2.33
1.80
2.30
47 5% 2.00
1% 2.65
1.97
2.59
1.94
2.54
1.91
2.51
1.89
2.45
1.87
2.41
1.85
2.38
1.83
2.35
1.81
2.32
1.80
2.29
48 5% 1.99
1% 2.64
1.96
2.58
1.93
2.53
1.90
2.48
1.88
2.44
1.86
2.40
1.85
2.37
1.83
2.34
1.81
2.31
1.79
2.28
49
5%
1%
1.99
2.63
1.96
2.57
1.93
2.52
1.90
2.47
1.88
2.43
1.86
2.40
1.84
2.36
1.82
2.33
1.80
2.30
1.79
2.27
50
5%
1%
1.98
2.62
1.95
2.56
1.92
2.51
1.89
2.46
1.87
2.43
1.85
2.39
1.83
2.36
1.82
2.33
1.80
2.29
1.78
2.26
Table-2 (Continued…)
Critical values for F-distribution
Smaller
MS
Degrees of freedom for greater mean square (n1)
(n2)
11
55 5% 1.97
1% 2.59
12
1.93
2.53
14
1.88
2.43
16
1.83
2.35
20
1.76
2.23
24
1.72
2.15
30
1.67
2.06
40
1.61
1.96
50
1.58
1.90
75
1.52
1.82
60
5%
1%
1.95
2.56
1.92
2.50
1.86
2.40
1.81
2.32
1.75
2.20
1.70
2.12
1.65
2.03
1.59
1.93
1.56
1.87
1.50
1.79
65
5%
1.94
2.54
1.90
2.47
1.85
2.37
1.80
2.30
1.73
2.18
1.68
2.09
1.63
2.00
1.57
1.90
1.54
1.84
1.49
1.76
1.93
2.51
1.89
2.45
1.84
2.35
1.79
2.28
1.72
2.15
1.67
2.07
1.62
1.98
1.56
1.88
1.53
1.82
1.47
1.74
1.91
2.48
1.88
2.41
1.82
2.32
1.77
2.24
1.70
2.11
1.65
2.03
1.60
1.94
1.54
1.84
1.51
1.78
1.45
1.70
100 5% 1.88
1% 2.43
1.85
2.36
1.79
2.26
1.75
2.19
1.68
2.06
1.63
1.98
1.57
1.89
1.51
1.79
1.48
1.73
1.42
1.64
125 5% 1.86
1% 2.40
1.83
2.33
1.77
2.23
1.72
2.15
1.65
2.03
1.60
1.94
1.55
1.85
1.49
1.75
1.45
1.68
1.39
1.59
150 5% 1.85
1% 2.37
1.82
2.30
1.76
2.20
1.71
2.12
1.64
2.00
1.59
1.91
1.54
1.83
1.47
1.72
1.44
1.66
1.37
1.56
200 5% 1.83
1% 2.34
1.80
2.28
1.74
2.17
1.69
2.09
1.62
1.97
1.57
1.88
1.52
1.79
1.45
1.69
1.42
1.62
1.35
1.53
400 5% 1.81
1% 2.29
1.78
2.23
1.72
2.12
1.67
2.04
1.60
1.92
1.54
1.84
1.49
1.74
1.42
1.64
1.38
1.57
1.32
1.47
1000 1.80
1%
70
5%
1%
80
5%
1%
5%
1%

5%
1.76
1.70
1.65
1.58
1.53
1.47
1.41
1.36
1.30
2.26
2.20
2.09
2.01
1.89
1.81
1.71
1.61
1.54
1.44
1.79
1.75
1.69
1.64
1.57
1.52
1.46
1.40
1.35
1.28
Department of Agricultural Statistics, OUAT
Page-100
UG Practical Manual on Statistics
1%
2.24
2.18
2.07
1.99
1.87
1.79
1.69
1.59
1.52
1.41
Table-3: χ2 (Chi-Squared) Distribution: Critical Values of χ2
Table-4: Critical value for Correlation coefficients (Simple or Partial)
Probability %
0.01
0.05
DF
DF
Probability %
0.01
0.05
DF
Probability %
0.01
0.05
1
2
3
4
5
1.000
0.990
0.959
0.917
0.874
0.997
0.950
0.878
0.811
0.754
41
42
43
44
45
0.389
0.384
0.380
0.376
0.372
0.301
0.297
0.294
0.291
0.288
130
135
140
145
150
0.223
0.219
0.215
0.212
0.208
0.171
0.168
0.165
0.162
0.159
6
7
8
9
10
0.834
0.798
0.765
0.735
0.708
0.707
0.666
0.632
0.602
0.576
46
47
48
49
50
0.368
0.365
0.361
0.358
0.354
0.285
0.282
0.279
0.276
0.273
160
170
180
190
200
0.202
0.196
0.190
0.185
0.181
0.154
0.150
0.145
0.142
0.138
11
12
13
14
15
0.684
0.661
0.641
0.623
0.606
0.553
0.532
0.514
0.497
0.482
52
54
56
58
60
0.348
0.341
0.336
0.330
0.325
0.268
0.263
0.259
0.254
0.250
250
300
350
400
450
0.162
0.148
0.137
0.128
0.121
0.124
0.113
0.105
0.098
0.092
16
17
18
19
20
0.590
0.575
0.561
0.549
0.537
0.468
0.456
0.444
0.433
0.423
62
64
66
68
70
0.320
0.315
0.310
0.306
0.302
0.246
0.242
0.239
0.235
0.232
500
600
700
800
900
0.115
0.105
0.097
0.091
0.086
0.088
0.080
0.074
0.069
0.065
21 0.526
0.413
72 0.298
0.229
1000 0.081
0.062
Department of Agricultural Statistics, OUAT
Page-101
UG Practical Manual on Statistics
22
23
24
25
0.515
0.505
0.496
0.487
0.404
0.396
0.388
0.381
74
76
78
80
0.294
0.290
0.286
0.283
0.226
0.223
0.220
0.217
26
27
28
29
30
0.478
0.470
0.463
0.456
0.449
0.374
0.367
0.361
0.355
0.349
82
84
86
88
90
0.280
0.276
0.273
0.270
0.267
0.215
0.212
0.210
0.207
0.205
31
32
33
34
35
0.442
0.436
0.430
0.424
0.418
0.344
0.339
0.334
0.329
0.325
92
94
96
98
100
0.264
0.262
0.259
0.256
0.254
0.203
0.201
0.199
0.197
0.195
36
37
38
39
40
0.413
0.408
0.403
0.398
0.393
0.320
0.316
0.312
0.308
0.304
105
110
115
120
125
0.248
0.242
0.237
0.232
0.228
0.190
0.186
0.182
0.178
0.174
Table-5: Percentage points of the normal distribution, Z
This table gives percentage points of the standard normal distribution. These are the values of z for which
a given percentage, P, of the standard normal distribution lies outside the range from -z to +z.
P (%)
90
0.1257
70
0.3853
80
60
50
40
30
20
15
10
5
2
1
0.50
Department of Agricultural Statistics, OUAT
Z
0.25
0.2533
0.5244
0.6745
0.8416
1.0364
1.2816
1.4395
1.6449
1.9600
2.3263
2.5758
2.8070
3.0233
Page-102
UG Practical Manual on Statistics
0.10
0.01
3.2905
3.8906
Table-6: Random numbers
Each digit in the following table is independent and has a probability of (1/10). The table was computed
from a population in which the digits 0 to 9 were equally likely.
77
21
24
33
39
07
83
00
02
77
28
11
37
33
77
10
41
31
90
76
35
00
25
78
80
18
77
32
78
85
75
57
59
76
96
63
65
37
58
79
87
96
72
67
25
72
59
21
96
16
02
21
05
19
59
96
90
61
02
16
29
68
92
86
20
61
09
14
93
48
32
85
65
57
14
77
47
73
76
36
65
64
55
43
56
39
60
97
03
78
34
85
49
53
38
89
19
98
98
88
82
80
25
00
59
00
91
03
14
37
43
75
37
56
79
65
92
27
00
74
07
44
74
48
45
80
57
06
74
67
48
73
83
39
34
91
42
11
90
08
64
82
41
25
19
50
97
06
73
63
30
35
08
55
82
54
27
43
71
36
Department of Agricultural Statistics, OUAT
07
70
53
07
38
72
81
26
17
62
78
26
83
64
36
47
60
75
07
50
79
08
13
32
01
22
12
27
28
71
84
11
43
10
39
09
92
97
26
77
66
71
69
14
11
14
50
42
06
21
61
16
12
62
28
26
85
62
58
25
81
55
15
58
52
86
95
58
80
89
09
90
91
08
19
88
99
83
99
36
99
65
96
59
63
96
39
60
58
81
01
12
19
22
95
25
59
59
91
94
11
46
15
67
51
71
14
14
45
40
88
83
88
37
80
76
02
65
27
54
77
48
73
86
30
67
05
73
50
31
04
18
64
41
74
16
44
69
47
91
79
12
93
25
34
54
47
41
77
15
74
55
49
51
55
55
21
56
13
67
31
75
18
53
89
31
98
13
87
35
72
56
29
61
91
15
Page-103
UG Practical Manual on Statistics
64
28
96
90
23
12
98
92
28
94
57
41
99
11
42
86
68
06
36
25
82
26
85
49
76
15
90
13
60
00
26
02
65
28
59
87
94
79
48
98
85
87
54
49
64
95
47
55
75
54
53
43
38
30
80
03
36
62
87
21
77
15
78
57
87
75
71
59
16
96
51
15
61
53
14
36
49
69
97
93
77
32
77
27
15
53
67
34
75
46
51
63
15
39
53
90
35
05
63
32
53
23
30
33
02
31
23
10
37
05
74
59
Department of Agricultural Statistics, OUAT
83
31
23
10
32
06
20
61
08
18
80
86
09
64
42
28
68
82
81
22
17
25
71
51
13
12
32
25
63
38
51
82
10
29
02
92
26
28
60
83
06
33
08
88
98
82
83
23
30
31
06
17
63
70
30
07
01
14
60
48
03
81
32
16
25
65
59
50
91
03
89
97
59
71
97
14
78
44
87
43
75
30
55
08
18
80
02
02
24
20
44
02
48
22
89
25
92
55
53
33
33
39
37
91
79
10
97
06
73
65
33
58
Page-104