Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
2: An Overview of Various Sampling Schemes….. AN OVERVIEW OF VARIOUS SAMPLING SCHEMES AND DETERMINATION OF SAMPLE SIZES K. K. Tyagi Indian Agricultural Statistics Research Institute, New Delhi-110012 1. INTRODUCTION The prime objective of a sample survey is to obtain inferences about the characteristic of a population. Population is defined as a group of units defined according to the objectives of the survey. The population may consist of all the households in a village / locality, all the fields under a particular crop etc. We may also consider a population of persons, families, fields, animals in a region, or a population of trees, birds in a forest depending upon the nature of data required. The information that we seek about the population is normally, the total number of units, aggregate values of various characteristics, averages of these characteristics per unit, proportions of units possessing specified attributes etc. The data can be collected in two different ways. The first one is complete enumeration which means collection of data on the survey characteristics from each unit of the population. This type of method is used in censuses of population, agriculture, retail stores, industrial establishments etc. The other approach is based on the use of sampling methods and consists of collection of data on survey characteristics from selected units of the population. The first approach can be considered as its special case. A sampling method is a scientific and objective procedure of selecting units from the population and provides a sample that is expected to be representative of the population. A sampling method makes it possible to estimate the population parameters while reducing at the same time the size of survey operations. Some of the advantages of sample surveys as compared to complete enumeration are reduction in cost, greater speed, wider scope and higher accuracy. A function of the unit values of the sample is called an estimator. Various measures, like bias, mean square errors, variance etc. are used to assess the performance of the estimator. The main problem in sample surveys is the choice of a proper sampling strategy, which essentially comprise of a sampling method and the estimation procedure. In the choice of a sampling method there are some methods of selection while some others are control measures which help in grouping the population before the selection process. In the methods of selection, schemes such as simple random sampling, systematic sampling and varying probability sampling are generally used. Among the control measures are procedures such as stratified sampling, cluster sampling and multi-stage sampling etc. A combination of control measures along with the method of selection is called the sampling scheme. We shall describe in brief the different sampling methods in the following sections: 2. SIMPLE RANDOM SAMPLING Simple random sampling (SRS) is a method of selecting 'n' units out of 'N' units such that each one of the possible non-distinct samples has an equal chance of its being chosen. In 15 2: An Overview of Various Sampling Schemes….. practice, a simple way of obtaining a simple random sample is to draw the units one by one with a known probability of selection assigned to each unit of the population at the first and each subsequent draw. The successive draws may be made with or without replacing the units selected in the preceding draws. The former is called the procedure of sampling with replacement while the latter sampling without replacement. The units in the population are numbered from 1 to N. A series of random numbers between 1 and N are then drawn either by means of a Table of random numbers or by means of a computer program that produces such a Table. It can be seen that in simple random sampling without replacement (SRSWOR), the probability of selecting the units in the sample is equal for all the units. Let Y be the characteristic of interest. The N units that comprise the population are denoted by y y2 y N 1 y N 1 N y , denote y1 , y2 ,..., y N . Let the population mean, YN 1 i N N i 1 the parameter of interest. Let us draw a simple random sample, without replacement, of size n. We denote by y y 2 yi y n1 y n 1 n yn 1 yi , the sample mean, an unbiased estimator of n n i 1 the population mean YN . In other words, the average value of y n over all possible samples ( N C n in this case) is equal to YN . Also, the sampling variance of y n is given by 1 1 V ( yn ) ( ) S 2 n N where S 2 (1.1) 1 N ( yi YN ) 2 is the population mean square. N 1 i 1 An unbiased estimator of this sampling variance is given by 1 1 Vˆ ( yn ) ( ) s 2 n N where s 2 (1.2) 1 n ( yi yn ) 2 is the sample mean square. n1 i 1 16 2: An Overview of Various Sampling Schemes….. We now take an example to illustrate the concepts of unbiasedness of sample mean ( y n ) and sample mean square (s2). Suppose we have a population of N = 5 units, say U1, U2, U3, U4, U5. The characteristic under study, Y is the yield in kg per plot. We want to estimate YN , the population mean of Y on the basis of a sample of size n = 2 drawn by SRSWOR. The data and the analysis are as given in the Table below: Units ( U i ) U1 U2 U3 U4 U5 Total Values of y i (kg / plot) 32 28 25 30 35 150 Population Mean (YN ) y y1 y 2 y3 y 4 y5 5 1024 2 i 784 32 28 25 30 35 150 30 kg 5 5 625 900 1225 4558 N 2 Yi NYN2 = 4558 5 30 30 4558 4500 58 14.5 Population Mean Square ( S 2 ) i 1 4 4 4 N 1 Units in the sample Sample Observations U1, U2 U1, U3 U1, U4 U1, U5 U2, U3 U2, U4 U2, U5 U3, U4 U3, U5 U4, U5 32, 28 32, 25 32, 30 32, 35 28, 25 28, 30 28, 35 25, 30 25, 35 30, 35 Sample Mean Sample Mean Square (s2) 30.0 28.5 31.0 33.5 26.5 29.0 31.5 27.5 30.0 32.5 8.0 24.5 2.0 4.5 4.5 2.0 24.5 12.5 50.0 12.5 yn 300/10 = 30 Average 145.0/10 = 14.5 A similar approach applies when sampling is with replacement (SRSWR). In this case, there are Nn possible samples. The estimator of population mean, sampling variance of the estimator, and estimator of the sampling variance are given as yn 1 n yi n i 1 V ( yn ) (1.3) 2 (1.4) n 17 2: An Overview of Various Sampling Schemes….. and 2 s Vˆ ( yn ) n where 2 (1.5) 1 N ( yi YN ) 2 is the population variance and s 2 is the sample mean square. N i 1 Consider all possible samples of size N which can be drawn from a given population. In SRSWOR scheme, there will be in all N C n possible samples. For each sample, one can compute a statistic, such as the mean, standard deviation etc., which will vary from sample to sample. In this manner, one can obtain a distribution of the statistic which is called its sampling distribution. N For estimating the population total Y yi , we have an estimator i 1 n Yˆ N yi / n N . y n (1.6) i 1 i.e., the population size N multiplied by the sample mean y n . n n i 1 i 1 This estimator can be expressed as Yˆ wi yi N / n yi , where wi N / n . The constant N / n is the sampling weight and is the inverse of the sampling fraction n / N . The estimator of sampling variance of Yˆ is given by Vˆ Yˆ Vˆ N . yn N 2 .Vˆ yn (1.7) and the estimator of standard error of Yˆ is given by SEˆ Yˆ SEˆ N . yn N . SEˆ yn (1.8) 18 2: An Overview of Various Sampling Schemes….. From the above, it is evident that under Simple Random Sampling With Replacement (SRSWR), yn is unbiased for the population mean Y , i) the sample mean ii) sample mean square (s2) is unbiased for the population variance 2 , and iii) V yn = N 2 n . Like-wise, under Simple Random Sampling Without Replacement (SRSWOR), yn is unbiased for the population mean Y , i) the sample mean ii) sample mean square (s2) is unbiased for the population mean square ( S 2 ) , and iii) V yn = 1 n N 1 2 S N 3. ESTIMATION OF POPULATION PROPORTION Sometimes, the units in the population are classified into two groups (i) having a particular characteristic and (ii) not having that characteristic. For example, a crop field may be irrigated or not irrigated. If it is irrigated, we say that it possesses the characteristic, ‘irrigation’. If it is not irrigated, we say that it does not possess the particular characteristic of irrigation. If we are interested in estimating the proportion of irrigated fields, the population of N fields can be defined with variate yi as having value 1 if the field is irrigated, a value 0, otherwise. If the total number of irrigated fields be N1 out of N, then N y i 1 i N1 (2.1) Thus, Y N 1 N yi 1 P proportion of irrigated fields in the population N i 1 N 19 (2.2) 2: An Overview of Various Sampling Schemes….. and N y i 1 2 i N1 N .P (2.3) Thus the problem of estimating a population proportion becomes that of estimating a population mean by defining the variate as above. If n1 units out of a random sample of size n n possess that characteristic, the proportion of irrigated fields in the sample is given by p 1 . n Thus n n i 1 i 1 yi n1 yi2 n. p (2.4) Hence an unbiased estimator of P is given by n 1 n Pˆ ( 1 ) p yi sample mean n n i 1 (2.5) In sampling without replacement, the variance of p is given by V ( p) ( N n) P.Q . ( N 1) n (2.6) where Q = 1 – P. In sampling with replacement, the variance of p is given by V ( p) P.Q n (2.7) In sampling without replacement, an unbiased estimator of V(p) is given by 20 2: An Overview of Various Sampling Schemes….. ( N n) p.q Vˆ ( p) . N (n 1) (2.8) where q =1 – p. In sampling with replacement, an unbiased estimator of V(p) is given by p.q Vˆ ( p) (n 1) (2.9) 4. EMPIRICAL EXERCISE The data given below pertains to the average yield of wheat crop (in quintals) pertaining to 108 villages in a Block of a District: Village Sl. Nos. 1-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80 81-90 91-100 101-108 20 25 16 29 16 24 25 15 28 24 16 21 25 28 19 29 32 31 19 25 34 28 32 24 30 27 32 23 38 32 31 18 30 Yield (in quintals) 41 55 22 32 75 28 29 29 19 42 39 11 40 63 30 8 35 27 31 43 21 19 50 10 6 4 22 28 10 70 29 29 19 64 29 37 26 21 35 36 27 24 20 37 42 38 34 21 35 25 30 36 39 32 34 28 19 31 45 28 29 37 28 71 42 35 19 35 61 18 29 47 43 44 47 i. Select a random sample of size 10 by simple random sampling without replacement (SRSWOR) and estimate the average yield along with its standard error on the basis of selected sample units. ii. Set up 95% confidence interval for the population mean. SOLUTION: As the population size N = 108 is a three digit number, so for selecting a simple random sample of size n = 10, we shall select three-digit random numbers from the Random Number Table (from 000 to 972, which is the highest multiple of 108 up to 999) as follows: 21 2: An Overview of Various Sampling Schemes….. Random Number 120 572 649 211 327 673 153 317 586 943 Sampling Unit Sl. No. (Remainder of Random Number/108) 12 32 01 103 03 25 45 101 46 79 Yield (q) 25 19 20 30 32 29 63 16 30 28 n Estimate of Population Average yield = YˆN y n y i 1 n i 292 29.2 q 10 Estimate of Population Total Yˆ N yn 108 29.2 3153.6 q The estimate of standard error of Yˆ is given by SEˆ Yˆ SEˆ N . y n N . SEˆ y n 1 1 where SEˆ y n . s n N where s 2 1 yi y n 2 1 1533.6 170.4 q 2 n 1 10 1 So, s 170.4 13.0537 q 1 1 Hence, SEˆ yn 13.0537 0.3012 13.0537 3.9322 10 108 ii) The 95% confidence interval for population mean is given by y n t 0.05 /(1019) df SEˆ y n 29.2 2.262 3.9322 29.2 8.89 22 2: An Overview of Various Sampling Schemes….. So, the 95% confidence interval for population mean is (29.2 - 8.89 to 29.2 + 8.89) i.e. (20.31, 38.09). It can be clearly seen that the population mean 3320 YN 30.74 q is contained in this confidence interval. It may be mentioned here that 108 out of total number of possible samples i.e. 108C10 , the population mean will be contained in such like confidence intervals corresponding to 95% of the total number of samples. 5. STRATIFIED RANDOM SAMPLING In random sampling techniques, the selection is based on random mechanism. Most of the times, random numbers are used for selection purposes. The simplest method of selection is simple random sampling (SRS) in which every sample gets an equal chance of selection. In SRS, units are selected with equal probability at every draw. We have seen that the precision of a sample estimate of the population mean depends not only upon the size of the sample and the sampling fraction but also on the population variability. Selection of a simple random sample from the entire population may be desirable when we do not have any knowledge about the nature of population, such as, population variability etc. However, if it is known that the population has got differential behaviour regarding variability, in different pockets, this information can be made use of in providing a control in the selection. The approach through which such a controlled selection can be exercised is called stratified sampling. In case of SRSWOR, the sampling variance of the sample mean is 1 1 V ( yn ) ( ) S 2 n N Clearly, the variance decreases as the sample size ‘n’ increases while the population variability S2 decreases. Now one of the objectives of a good sampling technique is to reduce the sampling variance. So we have to either increase ‘n’ or decrease S2. Apart from the sample size, therefore, the only way of increasing the precision of an estimate is to devise sampling procedure which will effectively reduce S2 i.e. the heterogeneity in the population. In fact, S2 is a population parameter and is inherent with the population, therefore, it cannot be decreased. Instead, the population may be divided into number of groups (called strata), thereby, controlling variability within each group. This procedure is known as stratification. In stratified sampling, the population consisting of N units is first divided into K sub-populations of sizes N1, N2,…,NK units respectively. These sub-populations are K non-overlapping and together they comprise the whole of the population i.e. Ni = N. These i 1 sub-populations are called strata. To obtain full benefit from stratification, the values of Ni’s must be known. When the strata have been determined, a sample is drawn from each stratum, the drawings being made independently in different strata. If a simple random sample is taken in each stratum then the procedure is termed as stratified random sampling. As the sampling variance of the estimate of population mean or total depends on within strata variation, the 23 2: An Overview of Various Sampling Schemes….. stratification of population is done in such a way that strata are homogeneous within themselves with respect to the variable under study. However, in many practical situations, it is usually difficult to stratify with respect to the variable under consideration especially because of physical and cost consideration. Generally the stratification is done according to administrative groupings, geographical regions and on the basis of auxiliary characters correlated with the character under study. Let N1, N2,…,NK denote the sizes of K strata, such that K N i 1 i N (total number of units in the population). Let y ij denotes the jth observation in the ith stratum. Denote YN 1 N K Ni y i 1 j 1 ij as population mean. Again, S2 1 K Ni ( yij YN ) 2 N 1 i 1 j 1 Si 1 Ni ( yij YN i )2 Ni 1 j 1 2 where YN i 1 Ni yij Ni j 1 = population mean square, = population mean square in the ith stratum, = population mean for the ith stratum. We select a simple random sample of size n1 from the first stratum, of size n2 from the second stratum,..., of size ni from the ith stratum and so on such that n1 + n2 + ....+ nK = n (sample size). The population mean YN can be written as YN K 1 K N Y Pi YN i i Ni N i 1 i 1 (4.1) 24 2: An Overview of Various Sampling Schemes….. where Pi Ni . N 1 ni Since within each stratum, the samples have been drawn by SRSWOR, yni ( yij ), sample ni j 1 mean for the ith stratum is an unbiased estimator of YNi and obviously yst K 1 K N y Pi yni i ni N i 1 i 1 (4.2) The weighted mean of the strata sample means with strata sizes as the weights, will be an appropriate estimator of the population mean. Clearly yst is an unbiased estimator of YN , since K K K i 1 i 1 i 1 E ( yst ) E ( Pi yni ) Pi E ( yni ) Pi YN i YN (4.3) Since the sample in the ith stratum has been drawn by SRSWOR, so 1 1 2 V yni Si Ni ni The sampling variance of y st is given by K K K i 1 i 1 i 1 V ( yst ) Pi 2V ( yni ) Pi Pj Cov( yni , yn j ) (4.4) Since the samples have been drawn independently in each stratum, so Cov( yni , yn j ) 0 and so K K 1 1 V ( yst ) Pi 2V ( yni ) Pi 2 ( ) Si2 ni Ni i 1 i 1 (4.5) 25 2: An Overview of Various Sampling Schemes….. Since the sample mean square for the ith stratum, si2 1 ni ( yij yni )2 unbiasedly estimates ni 1 j 1 the population mean square for the ith stratum, S i2 , it follows that an unbiased estimator of V ( yst ) is given by 1 1 Vˆ ( y st ) V ( y st ) Pi 2 ( ) si2 ni N i (4.6) From the above, we see that sampling variance of stratified sample mean depends on Si's, variability within the strata which suggests that the smaller the Si's, i.e. the more homogeneous the strata, greater will be the precision of the stratified sample. 6. EMPIRICAL ILLUSTRATION The data given below pertains to the total geographical area in 20 villages of a Block. Treating this as population of 20 units, stratify this population in three strata taking stratum sizes to be villages with geographical area, 50 ha or less, villages with geographical area in between 50 and 100 ha and villages having geographical area more than 100 ha. A sample of 6 villages is to be selected by taking 2 villages in each stratum. Compare the efficiency of stratified sampling with corresponding unstratified simple random sampling. Village Sl.No.: 01 02 03 04 05 06 07 08 09 10 Geographical : 020 080 050 100 150 070 020 250 220 010 Area (in ha.) Village Sl.No.: 11 12 13 14 15 16 17 18 19 20 Geographical : 050 140 080 020 050 030 070 090 100 220 Area (in ha) SOLUTION: Clearly N = 20, n = 6 . 26 2: An Overview of Various Sampling Schemes….. Population Mean YN 1 20 1820 yi 91ha 20 i 1 20 Population Mean Square S 2 1 N 1 N 2 96180 2 ( y Y ) ( yi N YN2 ) 5062 ha 2 i N N 1 i 1 N 1 i 1 19 1 1 Sampling Variance of Simple Random Sample Mean V ( yn ) ( ) S 2 590 ha 2 n N Now stratify the population according to given strata sizes into following three strata: STRATA Sl. No. I (less than 50 ha) UNITS 020 050 020 010 050 020 050 030 II (between 50 and 100 ha) 080 100 070 080 070 090 100 III (more than 100 ha) Clearly, N1 = 8, N2 = 7, 150 250 220 140 220 YN1 = 031.3 ha, YN 2 = 084.3 ha, N3 = 5, YN 3 = 176.0 ha, 2 S12 = 0270 ha S 22 = 0161 ha2 S 32 = 2330 ha2 From each stratum, a sample of 2 villages has been taken so n1 = n 2 = n 3 = 2. Now, K 1 1 V ( yst ) Pi 2 ( ) Si2 67 ha 2 ni Ni i 1 Obviously the stratification has reduced the sampling variance of the sample mean from 590 ha2 (in case of SRSWOR) to 67 ha2 (in case of stratified sampling) i.e. a reduction of about 89 per cent. In stratified sampling, having decided the strata and the sample size, the next question which a 27 2: An Overview of Various Sampling Schemes….. survey sampling expert has to face is regarding the method of selection within each stratum and the allocation of sample to different strata. The expression for the variance of stratified sample mean shows that the precision of a stratified sample for given strata depends upon the ni's which can be fixed at will. The guiding principle in the determination of the ni's is to choose them in such a manner so as to provide an estimate of the population mean with the desired degree of precision for a minimum cost or to provide an estimate with maximum precision for a given cost, thus making the most effective use of the available resources. The allocation of the sample to different strata made according to this principle is called the principle of optimum allocation. The cost of a survey is a function of strata sample sizes just as the variance. The manner in which cost will vary with total sample size and with its allocation among the different strata will depend upon the type of survey. In yield estimation surveys, the major item in the survey cost consists of labour charges for harvesting of produce and as such survey cost is found to be approximately proportional to the number of crop cutting experiments (CCEs). Cost per CCE may, however, vary in different strata depending upon labour availability. Under such K situations, the total cost may be represented by C ci ni where ci is the cost per CCE in the i 1 th i stratum. When ci is same from stratum to stratum, say c, the total cost of a survey is given by C c n . The cost function will change in form, if travel cost, field staff salary, statistical analysis etc. are to be paid for. K To find optimum values of ni (cost function being C ci ni ), consider the function i 1 V ( yst ) C (4.7) where is some constant. Now, K V ( yst ) C Pi 2 ( i 1 K = i 1 K K 1 1 2 ) Si ci ni ni Ni i 1 1 2 2 Pi Si + . c i . ni + terms independent of ni ni = ( Pi Si / ni ci ni ) 2 + terms independent of ni i 1 28 2: An Overview of Various Sampling Schemes….. Clearly, V ( y st ) is minimum for fixed cost C, or cost of a survey is minimum for a fixed value of V ( y st ), when each of the square terms on right-hand side of the above equation is zero i.e. Pi Si = ni or ni = ci ni Pi Si . ci (i= 1,2,...,K) (i= 1,2,...,K) (4.8) From the above, one can easily infer that: the larger the stratum size, the larger should be the size of the sample to be selected from that stratum, the larger the stratum variability, the larger should be the size of the sample from that stratum, and the cheaper the cost per sampling unit in a stratum, the larger should be the sample from that stratum. The exact value of ni for maximizing precision for a fixed cost C0 can be obtained by evaluating 1 / , the constant of proportionality as ni Pi Si ci C0 (4.9) K P S i 1 i ci i and the total sample size as K K n ni i 1 C0 ( Pi Si / ci ) i 1 K P S i 1 i (4.10) i ci The allocation of the sample size 'n' to different strata according to the above equation is known as optimum allocation. When ci is the same from stratum to stratum i.e. ci = c (say), the cost function takes the form C = c.n, or in other words, the cost of survey is proportional to the size of the sample, the optimum values of ni 's are given by 29 2: An Overview of Various Sampling Schemes….. ni n Pi S i (i= 1,2,...,K) K P S i 1 i (4.11) i The allocation of the sample according to the above equation is known as Neyman Allocation. On substituting for ni in V ( y st ) expression (4.5), we obtain 1 K 1 VN ( yst ) ( Pi Si ) 2 n i 1 N K P S i 1 i 2 i (4.12) where the subscript ‘N’ stands for stratification with Neyman Allocation. Another logical approach of allocation appears to be to allocate larger sample sizes for larger strata i.e. ni Ni or ni = n. Ni N (i=1,2,...,K). The allocation of sample size ‘n’ according to this equation is known as proportional allocation and V( y st ) in this case becomes 1 1 K VP ( y st ) ( ) Pi Si2 n N i 1 (4.13) where the subscript ‘P’ indicates the stratification with proportional allocation. 7. CLUSTER SAMPLING A cluster may be defined as a group of units. When the sampling units are clusters, the method of sampling is known as cluster sampling. Cluster sampling is used when the frame of units is not available or it is expensive to construct such a frame. Thus, a list of all the farms in the districts may not be available but information on the list of villages is easily available. For carrying out any district level survey aimed at estimating the yield of a crop, it is practically feasible to select villages first and then enumerating the elements (in this case farms) in the selected village. The method is operationally convenient, less time consuming and more importantly such a method is cost-wise efficient. The main disadvantage of cluster sampling is that it is less efficient than a method of sampling in which the units are selected individually. The efficiency of cluster sampling procedure increases as the heterogeneity between units belonging to same cluster increases. Cluster sampling becomes more efficient than element sampling if the units pertaining to same cluster are negatively correlated. Let there be N clusters in the population. Further, let M be the size of each cluster. We denote by Y the character under study. We define 30 2: An Overview of Various Sampling Schemes….. yij = value of the characteristic under study for jth element, (j=1,2,…,M) in the ith cluster, (i=1,2,…,N) M yi. 1 M YN . 1 N yi. the mean of the cluster means in the population , N i 1 Y.. y 1 NM j 1 N ij the mean per element of the i th cluster , M yij i 1 j 1 1 N N y i 1 i. the mean per element in the population . It is clear that YN . Y.. . This is so since the clusters are of equal size. If, however, the clusters vary in size, the two means will generally not be equal. We denote by ycl , the estimator of Y.. as y cl 1 n 1 yi. n i 1 nM n M y i 1 j 1 ij the mean of cluster means in a sample of n clusters (5.1) Clearl y ycl is an unbiased estimate of Y.. and its variance is given by 1 1 V ( ycl ) ( ) Sb2 n N (5.2) where Sb2 1 ( yi. YN . ) 2 is the mean square between cluster means in the population. N 1 The variance of the mean of n cluster means in terms of intra-class correlation c (intracluster correlation between elements belonging to the same cluster), if N is sufficiently large, is given by S2 1 (M 1) c V ( ycl ) nM (5.3) 31 2: An Overview of Various Sampling Schemes….. where S2 N M 1 ( yij Y.. ) 2 is the mean square between elements in the population. NM 1 i 1 j 1 If an equivalent sample of nM elements were selected from the population of NM elements by SRSWOR, the variance of the mean per element ynM would be V ( ynM ) ( 1 1 ) S2 nM NM (5.4) Accordingly, the relative efficiency of a cluster as the unit of sampling compared with that of an element is given by R.E. V ( ynM ) S2 V ( ycl ) M Sb2 (5.5) Thus, the relative efficiency of cluster sampling increases with decrease in both S b2 and M. Since relative efficiency of cluster sampling depends upon M, it is natural to determine the optimum size of the cluster. As in the case of stratified sampling, the optimum size of the cluster can be determined by minimizing variance for a fixed cost or vice-versa. A simple cost function can be C = c1nM +c2d (5.6) where C = total cost of survey operation, c1 = cost of enumerating an element including the time spent and cost of transportation within clusters, c2 = per unit cost of travelling a unit distance between clusters, and d = the distance between the clusters. 32 2: An Overview of Various Sampling Schemes….. By minimizing the variance subject to the above cost function, we obtain M c22 c1 C (5.7) Thus, M will be smaller if c1 and C are large and c2 is small. 8. MULTI-STAGE SAMPLING Generally, elements belonging to the same cluster are more homogeneous as compared to those elements which belong to different clusters. Therefore, a comparatively representative sample can be obtained by enumerating each cluster partially and distributing the entire sample over more clusters. This will increase the cost of the survey but the proportionate increase in cost vis-à-vis cluster sampling will be less as compared to increase in the precision. This process of first selecting clusters and then further sampling units within a cluster is called as two-stage sampling. The clusters in a two-stage sample are called as primary stage units (psus) and elements within a cluster are called as secondary stage units (ssus). A two-stage sample has the advantage that after psus are selected, the frame of the ssus is required for the sampled psus only. The procedure allows the flexibility of using different sampling design at the different stages of selection of sampling units. A two-stage sampling procedure can be easily generalized to multi-stage sampling designs. Such a sampling design is commonly used in large scale surveys. It is operationally convenient, provides reasonable degree of precision and is cost-wise efficient. 9. SYSTEMATIC SAMPLING In the method of systematic sampling, only the first unit is selected at random while rest of the units are selected according to a pre-determined pattern. Let N and n be the population size and sample size respectively. Further, N = n.k, where k is an integer. A random number ‘i’ is selected such that 1< i < k. Then the sample contains n units with serial number i, i+k, i+2k,…, i+(n -1)k. Systematic sampling can be used in situations such as selection of kth strip in forest sampling, selection of corn fields every kth kilometre apart for observation on incidence of borers, or the selection of every kth time interval for observing the number of fishing crafts landing at a centre. The method of systematic sampling is used on account of its low cost and simplicity in the selection of the sample. It makes control of field work easier. Since every kth unit will be in the sample, the method is expected to provide an evenly balanced sample. 33 2: An Overview of Various Sampling Schemes….. When N is not an exact multiple of n i.e. N = nk + r, a method of sampling called circular systematic sampling (given by Lahiri) is used to select the sample. This method consists of selecting a random number from 1 to N and thereafter selecting cyclically every k th unit until n units have been chosen in the sample. Cyclic selection means assigning number (N+1) to 1st unit, (N+2) to 2nd unit and so on, in order to continue the selection procedure when the Nth unit has been reached. A drawback of systematic sampling is that it is not possible to get an unbiased estimate of the variance of the estimator. However, it is possible to get an unbiased estimator of the variance based on number of systematic samples. 10. VARYING PROBABILITY SAMPLING In SRSWOR, the selection probabilities are equal for all the units in the population. However, if the sampling units vary in size considerably, SRSWOR may not be appropriate as it does not take into account the possible importance of the larger units in the population. To give possible importance to larger units, there are various sampling methods in which this can be achieved. A simple method is of assigning unequal probabilities of selection to the different units in the population. Thus, when units vary in size and the variable under study is correlated with size, probabilities of selection may be assigned in proportion to the size of the unit e.g. villages having larger geographical areas are likely to have larger populations and larger areas under food crops. For estimating the crop production, it may be desirable to adopt a selection scheme in which villages are selected with probabilities proportional to their population sizes or to their geographical areas. A sampling procedure in which the units are selected with probabilities proportional to some measure of their size is known as sampling with probability proportional to size (pps). The units may be selected with or without replacement. In sampling with replacement, the probability of drawing a specified unit at a given draw is the same. Procedure of selecting a sample with varying probabilities i) Cumulative Total Method To draw a sample of size n from a population of size N with ppswr, the procedure is as follows: Let x i be an integer proportional to size of the ith unit (i=1,2....,N), we make successive cumulative totals x1 , x1 x2 , x1 x2 x3 ,..., x1 x2 x3 ... x N , N and draw a random number R in between 1 and x i 1 34 i from a random number Table. 2: An Overview of Various Sampling Schemes….. Select the ith unit in the population for which x1 x2 x3 ,..., xi 1 R x1 x2 x3 ... xi 1 xi It is clear that this procedure of selection gives to the ith unit in the population, a probability of selection proportional to x i . This procedure is to be repeated n times, if a sample of size n is required. ii) Lahiri’s Method The main drawback of cumulative total method is that it involves writing down the successive cumulative totals which is time consuming and tedious especially if the number of units in the population is large. Lahiri (1951) had suggested an alternative procedure which avoids the necessity of writing down cumulative totals. It consists in selecting a pair of random numbers, say (i , j) such that 1 i N and 1 j M where M is the maximum of the sizes of the N units in the population. If j xi , the ith unit is selected; otherwise it is rejected and another pair of random numbers is chosen. For selecting a sample of n units with ppswr, the procedure is to be repeated till n units are selected. It can be seen that the procedure leads to the required probabilities of selection. 11. DETERMINATION OF SAMPLE SIZE In the planning of a sample survey, determination of sample size is an important decision which a survey statistician has to take while deciding the sampling plan. One has to be careful while deciding the sample size, because too large a sample implies waste of resources, and too small a sample diminishes the utility of the results. An efficient sampling plan should enable an optimum utilization of budgetary resources to provide the best estimators of the population parameters. As is well known, efficiency of an estimator is normally measured by inverse of mean square error (or variance in case of unbiased estimators). A desirable proposition would be to minimize the cost as well as variance simultaneously. But, unfortunately, it is not possible. With an increase in the sample size, normally the cost of the survey increases while the variance decreases, thereby increasing the efficiency. Thus for determination of sample size, a balance is required to be struck which is reasonable with respect to cost as well as efficiency. Sampling theory provides a framework within which the problem of determining sample size may be tackled reasonably. We first consider the estimation of sample size in case of simple random sampling. The problem has been analyzed in a very elegant way by considering hypothetical example by Cochran (1977). We quote the example, “An Anthropologist is preparing to study the inhabitants of some island. Among other things, he wishes to estimate the percentage of inhabitants belonging to blood group O. Co-operation has been secured so that it is feasible to take a simple random sample. How large should the sample be?” This is just a typical example. In fact, in almost all the sampling investigations, one has to face such problems. An answer to the question is not straight forward. First of all, one must be very clear about the 35 2: An Overview of Various Sampling Schemes….. objective of the study. Or atleast, the user must know to what use their results are going to be put, so that he should be able to answer as to what is the margin of error he is going to tolerate in the results. In the above example, the Anthropologist should be able to answer as to how accurately does he wish to know the percentage of people with blood group O? In this case he is reported to be content with a 5% margin in the sense that if the sample shows 43% to have blood group O, the percentage for the whole island is sure to be between 38 and 48. Since a random sampling procedure has been used, every sample has got some chance of selection and the possibility of getting the estimates lying outside the above specified range cannot be ruled out. Aware of this fact, the Anthropologist is prepared to take a 1 in 20 chance of getting an unlucky sample with the estimate lying outside the above margin. With the above information, ignoring fpc and assuming that the sample proportion p is assumed to be normally distributed, a rough estimate of n may be obtained. In technical terms, p is to lie in the range (P 5), except for a 1 in 20 chance. Since p is assumed to be normally distributed about the population proportion P, it will lie in the range (P 2 . p ) apart from a 1 in 20 chance (in 95% cases). Further, since the standard error of p is approximately given by p 2 . p or 2 or P.Q / n = where Q = (1 - P), we get 5 P.Q / n = 5 n = 4 P . Q / 25 At this point a difficulty appears that is common to all problems in the estimation of sample size. A formula for n has been obtained, but n depends on some property of the population that is to be sampled. Here, it is the quantity P that we would like to measure. We therefore ask the anthropologist if he can give us some idea of the likely value of P. He replies that from previous data on other ethnic groups, and from his speculations about the racial history of this island, he will be surprised if P lies outside the range 30 to 60%. This information is sufficient to provide a usable answer. For any value of P between 30 and 60, the product P.Q lies between 2100 and a maximum of 2500 at P = 50. The corresponding n lies between 336 and 400. To be on the safe side, 400 is taken as the initial estimate of n. 36 2: An Overview of Various Sampling Schemes….. PRINCIPAL STEPS INVOLVED IN THE CHOICE OF A SAMPLE SIZE A statement about the margin of error to be tolerated in the results. Choice of desired confidence level. Some equation that connects n with the desired precision of the sample should be found. This equation will contain, as parameters, certain unknown properties of the population. This must be estimated in order to give specific results. Usually in a sample survey, more than one characteristic is measured. Sometimes, the number of characteristics is large. If a desired degree of precision is prescribed for each characteristic, the calculation leads to a conflicting values of n, one for each characteristic. Some method must be found for reconciling these values. Finally, the chosen value of n must be appraised to see whether it is consistent with the resources available to take the sample. This demands an estimation of the cost, labour, time and material required to obtain the proposed size of sample. It sometimes becomes apparent that n will have to be drastically reduced. One has to choose whether to proceed with a much smaller sample size, thus reducing precision, or to abandon efforts until more resources can be found. Regarding the choice of a level for tolerable margin of error and the confidence level, the user normally has only a vague idea and it is only through the discussions and clarifications that a quantitative specific measures are obtained. It may be remarked that these measures are mainly subjective and depend largely on the judgment of the user regarding the importance, applicability and vulnerability of the results. Regarding the sample sizes in case of simple random sampling, the cases for qualitative and quantitative data are presented below: 12. QUALITATIVE DATA: ESTIMATION OF PROPORTIONS The units are classified into two classes, C and C . Some margin of error d in the estimated proportion p of units in class C has been agreed on, and there is a small risk that we are willing to incur that the actual error is larger than d; i.e., we want Pr ( p - P d ) = Simple random sampling is assumed, and p is taken as normally distributed. Now we know that, 37 2: An Overview of Various Sampling Schemes….. N n P.Q n p N 1 Hence the formula that connects n with the desired degree of precision is N n d t N 1 P.Q n at the tails. where t is the abscissa of the normal curve that cuts off an area of Solving for n, we find t2 P Q d2 n 1 t2 P Q 1 1 2 N d (12.1) For practical use, an advance estimate p of P is substituted in this formula. If N is large, a first approximation is n0 t2 p q p q V d2 (12.2) where V pq = desired variance of the sample proportion. n0 In practice, we first calculate n0. If n0/N is negligible, n0 is a satisfactory approximation to the n of (1). If not, it is apparent on comparison of (1) and (2) that n is obtained as n n0 n0 1 (n0 1) / N 1 (n0 / N ) (12.3) 38 2: An Overview of Various Sampling Schemes….. EXAMPLE In the hypothetical blood groups example, we had d = 0.05, p = 0.5, = 0.05, t = 2 . Thus, n0 (4)(0.5)(0.5) 400 (0.0025) Let us assume that there are only 3200 people on the island. The fpc is needed, and we find n n0 1 (n 0 1) / N 400 356 399 1 3200 The formula for n0 holds also if d, p and q are all expressed as percentages instead of proportions. Since the product pq Increases as p moves towards 1/2, or 50 %, a conservative estimate of n is obtained by choosing for p the value nearest to 1/2 in the range in which p is thought likely to lie. If p seems likely to lie between 5 and 9 %, for instance, we assume 9 % for the estimation of n. 13. QUANTITATIVE DATA: ESTIMATION OF POPULATION MEAN Consider a population of size N from which a simple random sample is to be selected for estimating the population mean Y . Suppose, we wish to control the relative error r in the estimated population total or mean. As the sample mean y estimates the population mean Y , the margin of error and confidence level are specified as Pr yY Ny NY r Pr r y NY Pr y Y r Y where is a small probability. We assume that y is normally distributed. Its standard error is y = Nn N S n 39 2: An Overview of Various Sampling Schemes….. Hence rY t . y = t. Nn S N n (13.1) Solving for n gives n( t S 2 1 t S 2 ) / 1 ( ) r Y N r Y Note that the population characteristic on which n depends is its coefficient of variation S / Y . This is often a more stable quantity and easier to guess in advance than S itself. As a first approximation, we take n0 ( t S 2 1 S 2 ) ( ) C Y rY (13.2) substituting an advance estimate of (S / Y ). The quantity C is the desired (cv)2 of the sample estimate. If n0 / N is appreciable, we compute n as in (12.3) n n0 n 1 0 (13.3) N If instead of the relative error r, we wish to control the absolute error d in y , we take n0 = t2S2/ d2 = S2 / V, where V is the desired variance of Y . Sometimes the specification error to be tolerated is only given in terms of desired per cent S.E. of the estimator e.g. the estimate is desired with a maximum of say 5 % S.E. In such cases, n is obtained from the corresponding formulae. In simple random sampling, if the desired % S.E. is d, then n is given by 1 n 2 ( d C2 1 ) N 40 2: An Overview of Various Sampling Schemes….. where C is the % cv of the population. METHODOLOGICAL SAMPLE SIZE ISSUES RELATING TO DETERMINATION OF The determination of sample size is generally based on (a) the available financial and manpower resources and (b) the required level of reliability in the estimates expected from the sample. Generally, it would be preferable to start with the second consideration, and if the budget is a constraint to assess the precision that can be achieved under that constraint in order to decide whether the achievable precision would be acceptable, and if not, whether the budget should be increased. In a sample survey, the sampling error associated with a given sample size varies from item to time. For major items of frequent occurrence, such as area under a crop, generally the sampling error is less that than the minor items of infrequent occurrence such as use of pesticides/ insecticides. Similarly, for items which have greater variability, the sampling error would be larger than that for fewer variables. To decide on the sample size for the survey, it would be necessary to calculate the sample size required for estimating with the requisite precision a few major items of interest and take the largest of the indicated size requirements as the sample size for the survey. Estimates from a survey are often required not only at the national level but also for the main region of a country and for certain domains of study such as rural and urban sectors and specified groups such as “irrigated crop” and “un-irrigated crop”. Generally, the sample size required for a given precision at the national level would not give estimates for the regions and groups with the same precision because of effective sample size for the region or group. Compromise solution involving this has to be accepted. OVERALL SAMPLE SIZE If a pilot survey is undertaken for testing questions and survey procedure before the main survey is launched, it may be possible to estimate roughly the parameters (population mean and standard deviation) required for the determination of sample size for the various items of interest. However, that may not be always possible, unless the pilot survey is taken well in advance, because estimates of financial resource required are to be made available to the government quite in advance to make the requisite budget. Thus the determination of the sample size in most cases may have to be done in advance without any pilot survey. In such cases the survey Statistician has to make use of the information which is readily available. He may often have to depend on the results of similar survey conducted in the past, preferably in the same country or elsewhere in the neighboring countries. If no such results are available, the Statistician has to make reasonable guesses of the different parameters which enter the formula for determination of the sample size. 41 2: An Overview of Various Sampling Schemes….. If a stratified multi-stage random sampling design is used, which is usually the case, the problems are further compounded because information is required not merely on the population mean and standard deviation (SD), but also its components of variance between primary stage sampling units (PSUs) and within PSUs. What one can do in such circumstances is to proceed in stages by working out the sample size required for a simple random sample (SRS) and to make adjustments to the sample size to take into account the effects of multi-stage sampling and possible stratification as illustrated in the paragraphs that follow. Generally the level of precision desired of an estimate is expressed as a percentage of itself or, strictly speaking of the population parameter. But the subtle difference is usually ignored. Let Y be the characteristic under study, Y be the population mean and y be the sample mean. Clearly, y is an estimate of Y . The required sampling precision is prescribed as a percentage of y . For example, sampling precision of y should be E per cent of y i.e., the population average should lie between y Exy Exy . and y 100 l00 Taking 95% confidence interval, which is the usual case, this implies that E x y /100 = 2 x SE ( y ) SE( y) E x 100 y 2 Percentage relative SE = E 2 If the sampling precision is set at 5% relative SE, we know that in simple random sampling Relative SE ( y ) = SE( y) x 100 Y = n x 100 = Y Y 42 x 100 x 1 n 2: An Overview of Various Sampling Schemes….. = Coefficient of var iation (CV) x 100 n E CV x 100 2 n n = 4 x 10000 x If E = 5, n = 40000 CV 2 E2 = 40000 CV 2 E2 CV 2 1600 (CV) 2 25 n = 40000 If E = 10, (CV) 2 400 (CV) 2 100 Thus to determine the sample sizes we require the value of CV which can be estimated on the basis of a previous survey on the subject or a closely related subject or a survey of a neighboring or similarly placed country. Coefficient of variation, generally being a stable quantity can also be approximated on the basis of some related information. e.g. average household income can be taken to be equal 1 to per capita income x average household size and of the known range of household 6 income can be taken as an approximation of the SD on the assumption of a normal distribution. However, a better assumption is that of log-normal distribution, i.e. instead of assuming y to be distributed normally, assume that loge y is distributed normally. (e = 2 3 = 2.7) Let us assume that loge y is distributed normally with mean a and standard deviation b, then 2 Mean of y = e Variance of (CV)2 of a b2 y = e2a b y = 2 2 eb 1 43 2 eb 1 2: An Overview of Various Sampling Schemes….. Median of y = e a 2 Mode of y = e a b 2 Mean of y eb / 2 Median of y Mean of y 3b2 / 2 e Mode of y Thus if mean is approximated (as indicated for household income by per capita income x average household size) and the median or mode is roughly guessed, it is possible to calculate the value of ‘b’ and therefore, approximate C.V. It may also be noted that an error in the estimate of mode effects the estimate of ‘b’ less than the same relative error in the estimate of the median. Also it may be perhaps less difficult to make a good guess of the mode than median. Further, log normal distribution has the property that the proportion of population with values less than or equal to the mean is given by P(b/2), where P(t) is the area to the left of ‘t’ of a standard normal probability density function. Thus if we guess the proportion of households whose income is less than or equal to the average, it is possible to obtain the value of ‘b’ by referring to the corresponding proportion in the standard normal distribution tables and thus arrive at an estimate of CV. We shall now discuss a simple method that does not depend upon the estimation of either the population mean or the standard deviation, but assumes log-normal distribution. Take the case of estimation of household income. Based on empirical data collected from a large number of countries, it is reported in a technical study on Household Income and Expenditure Surveys (Published in 1989 by Statistics Office of the United Nations under National Household Survey Capability Programme) that nearly two-thirds of the population in the case of distribution of income or similar economic variables lie below the average value. Thus with the property mentioned in above para gives CV = 1.0492 1 It is not claimed that the observation mentioned above is universal. If the proportion between the average is different, CV will be different as given below: 44 2: An Overview of Various Sampling Schemes….. Percentage of population below average 55 60 65 70 75 80 C.V. 0.2554 0.5409 0.9005 1.4157 2.2739 4.0000 It would be seen that as the percentage of population below the average changes, the value of CV changes very fast. Thus, rather than working with the assumed proportion of two-thirds, one may prefer to err on the safe side and take the proportion as 70% and use CV = 1.4157 1.42 or (CV)2 = 2. Table 5 (pp180-187) of the UN publication mentioned above gives the value of CV and the proportion of households below the average for some 54 countries. It can be seen from that Table that in a few cases CV exceeded 1.42. Thus the assumption of (CV) 2 = 2 is not unrealistic. If (CV)2 = 2, we get n = 3200 and if (CV)2 = 1, as it happens in most cases, we get n = 1600. In the above calculations, we assumed that finite population correction (fpc) factor can be ignored, i.e. the population size is very large as compared to sample size. As a working rule, when n / N < 1%¸ fpc can be ignored. However, if fpc can not be ignored, then the sample n size n for a simple random sample will be n = , where n is the sample size for the n 1 N case when fpc can be ignored. In large scale sample surveys, generally one uses a two-stage random sampling design. A two-stage design is generally less efficient than SRS of the same ultimate size and to achieve the same level of precision as in a SRS, a larger number of ultimate stage sampling units has to be surveyed. This is called the design effect and the extent of the upward adjustment to the sample size depends on the degree of similarity of second stage units within a PSU, which is measured by intra-class correlation coefficient. As a good working rule, one can take the value of 2 for the design effect as indicated in the Handbook of Household Surveys (Revised Edition), Studies in Methods, Series F.No.31, 1984 of the United Nations. Using the value 2 for the design effect, we get n = 6400, if (CV)2 = 2 and n =3200, if (CV)2 = 1. Here again the sample size required would be less if appropriate stratification is used at various stages. SAMPLE SIZE FOR DOMAINS Estimates are generally required not only at the national level but also for certain domains such as geographical regions, rural and urban areas. One has then to work out the sample size for each domain and add them to arrive at the national sample size. We shall assume that domain-wise estimates are required with the same precision of 5%. Further, for sake of simplicity, we will deal with the case of two domains of study. For sake of illustration, let us 45 2: An Overview of Various Sampling Schemes….. consider a Household Income and Expenditure Survey (HIES) and rural and urban areas may be taken as the two domains of study. Further, let us assume that 80% of households (hh) are rural and 20% of the hh are urban. Suppose further that the average household income for the urban area is twice the national average. With the 80 : 20 ratio between rural and urban hh, it means that the average household income in the rural areas, is 75% of the national average. Let (CV)r and (CV)u be the CV for rural and urban areas respectively. If (CV)2 = 2, it can be shown that 0.45 (CV) r2 + 0.80 (CV)u2 = 1.75 (Please see Annexure) If we assume that (CV)r = (CV)u , we get (CV) 2r = (CV) 2u = 1.4 Thus to obtain the same relative precision of 5% (i.e. relative SE of 2.5%), the sample size for each of two sectors will be 1.4 x 1600 x 2 (design effect) = 4480 hh Hence, National sample size = 8960 hh. Thus the national sample size is increased by 40% from 6400 to 8960. If we do not have that many resources, we can divide the national sample size of 6400 equally to rural and urban areas. The effect of allocation of 3200 instead of 4480 hh would be that, instead of 5% we shall have 5.9% precision (2.95% relative SE). In the above we have assumed that (CV)r = (CV)u . It is likely that (CV)u > (CV)r. The table below gives the sample size required with different (CV)u and (CV)r but satisfying the equation (1). Sample size ( number of households ) (CV) 2 (CV) 2 r 1.0 1.1 1.2 1.3 1.4 1.5 u 1.625000 1.56875 1.515250 1.45625 1.40000 1.34375 Rural 3200 3520 3840 4150 4480 4800 Urban 5200 5020 4840 4660 4480 4300 Total 8400 8540 8680 8820 8960 9100 For any other value of (CV) 2r , all other values can be determined by linear interpolation. 46 2: An Overview of Various Sampling Schemes….. Annexure Let us use the following notations Population size Mean S.D. Urban Rural Total N1 1 1 N2 2 2 N N1 N We have assumed that N2 N 20 100 We note that 1 x 100 = 20 and x 100 = 80 80 100 2 Further we have assumed 1 2 , So 20 100 0.8 2 x 2 80 100 2 2 0.6 Let us now get the relationship amongst , 1 and 2 . Let x1i be the observation on the i-th unit in the urban area, x2j be the observation on the jth unit in the rural area. We know that N1 12 N1 i 1 x12i N1 12 47 0.75 2: An Overview of Various Sampling Schemes….. N2 22 N2 j 1 x22 j N2 22 N2 N1 N.2 x12i x 22 j N 2 j1 i1 2 2 N112 N112 N2 2 2 N2 2 N 2 So 2 2 N1 2 N2 2 N1 2 N2 2 1 2 1 2 2 N N N N 2 2 N1 1 N2 2 N1 1 N2 2 2 2 1 N 2 N 2 N 2 N 2 = We note that 2 N1 12 12 N2 2 N1 12 N2 2 2 2 2 x x 1 2 2 2 2 2 N 1 N 2 N N 2 2 N1 1 N2 4 1 2 3 9 , , 2 4, 2 ( )2 , and 2 N 5 N 5 4 16 on substitution, (1) reduces to (CV) 2 4 5 (CV)u2 4 5 x 9 4 4 9 (CV)r2 x 1 16 5 5 16 4 9 = 0.8 (CV)u2 + 0.45 (CV)r2 + 1 5 20 As (CV)2 = 2, so 2 = 0.8 (CV)u2 + 0.45 (CV)r2 + 1.75 1 4 = 0.8 (CV)u2 + 0.45 (CV)r2 48 (1) 2: An Overview of Various Sampling Schemes….. REFERENCES Cochran, W.G. (1977): Sampling Techniques. Third Edition. John Wiley and Sons. Des Raj (1968): Sampling Theory. TATA McGRAW-HILL Publishing Co. Ltd. Des Raj and Chandok, P. (1998): Sample Survey Theory. Narosa Publishing House. Murthy, M.N. (1977): Sampling Theory and Methods. Statistical Publishing Society, Calcutta. Singh, D. and Chaudhary, F.S. (1986): Theory and Analysis of Sample Survey Designs. Wiley Eastern Limited. Singh, D., Singh, P. and Kumar, P. (1978): Handbook of Sampling Methods. I.A.S.R.I., New Delhi. Singh, R. and Mangat, N.S. (1996): Elements of Survey Sampling, Kluwer Academic Publishers. Sukhatme, P.V. and Sukhatme, B.V. (1970): Sampling Theory of Surveys with Application. Second Edition. Iowa State University Press, USA Sukhatme, P. V., Sukhatme, B.V., Sukhatme, S. and Asok, C. (1984): Sampling Theory of Surveys with Applications. Third Revised Edition, Iowa State University Press, USA. 49