Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
PPS Sampling Situations can be met where the population contains a number of elements that have an extremely large value for the study variable. This is often the case in business surveys. A suitable sampling technique in such a case, especially for the estimation of a total, is one in which the inclusion probability depends on the size of the population element. Reduction in variance can then be expected if the size measure and the study variable are closely related. Because this sampling technique is based on inclusion probabilities proportional to relative sizes of the population elements, it is called sampling with probability proportional to size (PPS). In PPS sampling, inclusion probabilities will vary according to the relative sizes of the elements. The size of a population element is measured by an auxiliary positive-valued variable z. It is assumed that the value Zk of the auxiliary variable is known for each population element k, since the relative size equals the quotient pk = Zk/Tz, where Tz is the population total of the auxiliary variable or more precisely Tz = _Nk =1 Zk. Commonly used size measures are variables that physically measure the size of a population element. In business surveys, for example, the number of employees in a business firm is a convenient measure of size, and in a school survey the total number of pupils in a school is also a good size measure. The auxiliary variable z is selected such that its own variability resembles that of the study variable y. More precisely, a size measure z is sought whose ratio to the value of the study variable is, as close as possible, a constant. This is because the efficiency under PPS depends on the extent that the ratio Yk/Zk remains a constant C, for all the population elements. If the ratio remains nearly a constant, then the design variance of an estimator will be small. In PPS sampling, the inclusion probabilities πk are proportional to the relative sizes pk = Zk/Tz of the elements, and the individual weighting of the sampled elements is based on the inverse values of these relative sizes. It is possible to draw a PPS sample either without or with replacement. Calculation of the inclusion probabilities is easier tomanageunder with-replacement-type sampling. Obtaining these probabilities can be complicated in without-replacement-type PPS sampling because when the first element is sampled, the relative size of the remaining (N − 1) elements is changed and then new inclusion probabilities should be calculated. Various techniques have been developed to overcome this difficulty, and PPS sampling can be very efficient, especially for the estimation of the total, if a good size measure is available. Sample Selection A number of sampling schemes have been proposed for selecting a sample with probability proportional to size. The starting point is knowledge of the values of the auxiliary variable z for each population element so that probabilities of selection can be calculated. The inclusion probability πk for a population element k is proportional to the relative size Zk/Tz. For example, in the trivial case of simple random sampling with replacement, the relative sizes are pk = 1/N for each k. The quantity 1/N is also called the single-draw selection probability of a population element k. The inclusion probability of an element for a sample of size n would be πk = n × pk = n/N. But in PPS sampling, the inclusion probabilities πk vary and, thus, it is not an equalprobability sampling design in contrast to simple random sampling and systematic sampling. In practice, the selection of a PPS sample can be based on the relative sizes of the population elements or, alternatively, on the cumulative sum of size measures. The cumulative total for the kth element is The natural numbers [1, G1] are associated with the first population element, and the numbers [G1 + 1, G2] with the second element; generally, the kth element receives the numbers belonging to the interval [Gk−1 + 1, Gk]. The sample selection process is based on these figures. We consider five specific selection schemes for PPS sampling. These are Poisson sampling, which resembles Bernoulli sampling, the cumulative total method with replacement or without replacement, systematic sampling with unequal probabilities and the Rao–Hartley– Cochran method (RHC method; Rao et al. 1962). Of these, the cumulative total method with replacement and systematic sampling with unequal probabilities are considered in more detail. In the examples, the variable HOU85 measures the size of a population element. It is register-based and gives the number of households in each population municipality. Poisson sampling This sampling scheme uses a list-sequential selection procedure. First the inclusion probabilities πk = n×Zk/Tz are calculated. Then, let ε1, . . . , εk, . . . , εN be independent random numbers drawn from the uniform (0,1) distribution. If εk < πk, then the element k is selected. This procedure is applied to all population elements k = 1, . . . ,N, in turn. Cluster Sampling Cluster sampling in social and business surveys is motivated by the need for practical, economic and sometimes also administrative efficiency. An important advantage of cluster sampling is that a sampling frame at the element level is not needed. The only requirements are for cluster-level sampling frames and frames for subsampling elements from the sampled clusters. Cluster-level frames are often easily accessible, for example, for establishments, schools, blocks or block-like units etc. Moreover, these existing structures provide the opportunity to include important structural information as part of the analysis. For instance, in an educational survey it is practical to use the information that pupils are clustered within schools and further clustered as classes or teaching groups within schools. Schools can be taken as the population of clusters from which a sample of schools is first drawn and then a further sample of teaching groups can be drawn from those schools that have been sampled. If all the pupils in the sampled teaching groups are measured, then the design belongs to the class of two-stage cluster-sampling designs. And in addition to sample selection and data collection, the multi-level structure can be used in the analysis, for example, for examining differences between schools Tompson (2012), p. 157 Cluster and Systematic Sampling Although systematic sampling and cluster sampling seem on the surface to be opposites—the one spacing out the units of a sample and the other bunching them together—the two designs share the same structure. The population is partitioned into primary units, each primary unit being composed of secondary units. Whenever a primary unit is included in the sample, the y-values of every secondary unit within it are observed. In systematic sampling, a single primary unit consists of secondary units spaced in some systematic fashion throughout the population. In cluster sampling, a primary unit consists of a cluster of secondary units, usually in close proximity to each other. In the spatial setting, a systematic sample primary unit may be composed of a collection of plots in a grid pattern over the study area. Cluster primary units include such spatial arrangements as square collections of adjacent plots or long, narrow strips of adjacent units. The key point in any of the systematic or clustered arrangements is that whenever any secondary unit of a primary unit is included in the sample, all the secondary units of that primary unit are included. Even though the actual measurements may be made on secondary units, it is the primary units that are selected. In principle, one could dispense with the concept of the secondary units, regarding the primary units as the sampling units and using, as the variable of interest for any primary unit, the total of the y-values of the secondary units within it. Then all properties of estimators may be obtained based on the design by which the sample of primary units is selected. However, several common features of systematic and cluster sampling make these designs worth considering as special cases: 1. In systematic sampling, it is not uncommon to have a sample size of 1, that is, a single primary unit. 2. In cluster sampling, the size of the cluster may serve as auxiliary information that may be used either in selecting clusters with unequal probabilities or in forming ratio estimators. 3. The size and shape of clusters may affect efficiency Thompson Steczkowski N n Mi M i Liczba wiązek Ile wiązek w próbie Liczba obiektów w wiązce Liczba obiektów w populacji yij yi=i Wartość zmiennej Y w wiązce Średnia Y w populacji Średnia Y w wiązce K k Nj N i Suma Y w wiązce Yij j Suma Y w populacji THE BASIC PRINCIPLE Since every secondary unit is observed within a selected primary unit, the within-primary-unit variance does not enter into the variances of the estimators. Thus, the basic systematic and cluster sampling principle is that to obtain estimators of low variance or mean square error, the population should be partitioned into clusters in such a way that one cluster is similar to another. Equivalently, the within-primaryunit variance should be as great as possible in order to obtain the most precise estimators of the population mean or total. The ideal primary unit contains the full diversity of the population and hence is “representative.” With natural populations of spatially distributed plants, animals, or minerals, and with many human populations, the condition above is typically satisfied by systematic primary units, in which the secondary units are spaced apart, but not by clusters of geographically adjacent units. Cluster sampling is more often than not carried out for reasons of convenience or practicality rather than to obtain lowest variance for a given number of secondary units observed. With many natural populations, units near each other tend to be similar, so with compact clusters, ρ is greater than zero. For such populations, the value of ρ, and hence the variance of ˆ τ , will tend to be larger with square clusters, in which the secondary units are close together, than with long, thin clusters, in which at least some of the secondary units are far apart. With systematic sampling, the secondary units of each primary unit are spaced relatively far apart, so that ρ may well be less than zero. For this reason, systematic sampling is inherently efficient with many real populations. The advantage of cluster sampling is that it is often less costly to sample a collection of units in a cluster than to sample an equal number of secondary units selected at random from the population.