Download Document 713

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

This file was created by scanning the printed publication.
Errors identified by the software have been corrected;
however, some errors may remain.
Cost-Effective, Practical Sampling
Strategies for Accuracy Assessment
of Large-Area Thematic Maps
Stephen V. stebmanl
Abstract. - Accuracy assessment is an expensive but
necessary process in the development and eventual use of a
large-scale thematic map.
he sampling strategy used for
collecting and analyzing the reference samples required for
the accuracy assessment must be cost-effective, yet still
achieve satisfactory precision for the estimated accuracy
parameters. Strata and clusters may be used to improve the
efficiency of the sampling strategy , and more specialized
designs such as double sampling or adaptive cluster sampling
may provide better precision for certain objectives. Poststratified and regression estimators may be combined with a
simple design to yield enhanced precision without subst antially increasing costs. Each strategy has strengths and
weaknesses and no single strategy is ideal for all applications.
Large-area, satellite-based land cover mapping projects are becoming increasingly
important for use in environmental monitoring and modeling, resource inventories, and
global change research. Assessing the thematic accuracy of these maps requires
balancing the needs of statistical validity and rigor with the practical constraints and
limited resources available for the task. Finding this balance is a characteristic of
practically any applied sampling problem. The primary expense is obtaining ground
reference samples, the critical data for an accuracy assessment. A cost-effective,
practical accuracy assessment strategy must efficiently use these ground reference
samples. This efficiency can be achieved by selecting an efficient sampling design, or by
using efficient analysis procedures following collection of the data.
A site-specific, thematic accuracy assessment is assumed of primary interest, with
the evaluation unit being a single pixel as suggested by Janssen and van der We1 (1994).
Analogous strategies could be developed for a polygon-based accuracy assessment, but
that topic is not pursued here. Each pixel on the target map corresponds to a particular
location on the ground, and it is assumed that a ground visit leads to a correct landcover classification of that pixel. Location error will be confounded with classification
error if field site locations do not correspond exactly with map locations. This view of
accuracy assessment permits illustration of the basic principles and techniques to be
presented in this paper.
The statistical approach taken is the classical finite population sampling model
(Cochran 1977). In this perspective as applied to accuracy assessment (Stehman 1995),
the population consists of the N pixels on the map. If a complete census of ground
reference sites were obtained, each pixel on the map could be labeled as correctly or
' S U N Y College of Environmental Science and Forestry, 920 B r a y Hall, Syracuse, N Y
incorrectly classified, and a population error matrix constructed from these results in
which the rows of the error matrix represent the map classifications, and the columns
represent the reference classifications ("truth").
The objective of the accuracy
assessment is to use a sample of reference locations to construct a sample error matrix
and then to compute estimates of various accuracy parameters, typically the overall
proportion of pixels correctly classified (P,), user's accuracy, producer's accuracy, and
the kappa coefficient of agreement ( K ) .
A basic recommendation is that a probability sampling design should be used to
collect the reference data. For a probability sample, each pixel in the population has a
positive, known probability of being included in the sample.
This "inclusion
probabilityn is a characteristic of the sampling design. By knowing the inclusion
probability of each sampled pixel, the pixels can be correctly weighted when estimating
accuracy parameters. All pixels need not have the same inclusion probability, a
stratified sample being a good example, as long as the inclusion probability is known.
Probability sampling provides an objective, scientifically defensible protocol for accuracy
In the finite sampling perspective, a sampling strategy consists of two parts, the
sampling design, which is a protocol by which the sample data are collected, and an
estimator, which is a formula for estimating a population parameter. An "efficient"
sampling strategy is one in which the variance of an estimator is low, or, equivalently,
precision is high. Efficiency translates into cost savings in that if a sampling strategy
can provide the same variance using fewer sampling locations than another strategy, the
former strategy is more cost-effective. Efficiency can be gained by judicious choice of
design or estimator, or both. Stratification, cluster sampling, double sampling, and
adaptive cluster sampling are design options for improving efficiency. Regression and
poststratified estimators are estimation techniques commonly employed in finite
population sampling to improve precision.
These estimation techniques require
auxiliary information in addition to the sample reference data. Aerial photography or
videography are potential sources of auxiliary information, and the number of pixels
classified into each cover type by the land-cover map can be used as auxiliary
information in poststratification.
Several stratification options exist. Stratifying by geographic region ensures that
the sampling design for accuracy assessment results in a regionally well-distributed
sample, spreads the workload out geographically, and allows for convenient reporting of
accuracy statistics by region. For parameters summarizing the full error matrix, such as
PC and K , geographic stratification normally does not produce large gains in precision
over simple random sampling (SRS) (Cochran 1977, p. 102) and should not be selected
with the hope of achieving a substantial reduction in variance for these estimators.
Although geographic stratification is convenient for reporting accuracy statistics by
region, it is not necessary to stratify to obtain regional estimates. Any geographic
region can be identified as a subpopulation and accuracy statistics calculated for that
subpopulation. The advantage of stratification is that the sample size can be controlled
in each region or stratum, and the stratum-specific estimates will have smaller variance
compared to the estimates obtained via SRS or another equal probability design such as
systematic or cluster sampling. For any equal probability design, the sample size in a
region will be proportional to the area of that region, and for small regions, this sample
size may be too small to achieve adequate precision.
Stratifying by the land-cover classes identified on the target map has the advantage
of guaranteeing a specified sample size in each land-cover class, and thus is cost-effective
if the primary objective is estimating user's accuracy. Efficiency for estimating
producer's accuracy, PC,and K may be reduced when the design is stratified by the
classes identified on the map depending on the allocation of samples to the strata.
Stratifying by the map classification also requires that the map be completed prior to
selecting the sample, and this may result in a delay between the time of imagery and
the time the ground sampling takes place. Changes in land cover may occur during the
intervening period.
Still another possible stratification option was employed by Edwards et al. (1996) in
a large-area accuracy assessment in Utah, USA. Within broad geographical strata, two
additional spatial strata were created, a stratum identified by a 1-km wide corridor
centered on a road, and an off-road stratum. The advantage of this stratification
scheme is that access to reference sites is much easier for areas in close proximity to a
road, and the number of samples per unit cost can be increased dramatically by
increasing the sampling intensity in the road stratum. Selecting some samples from the
non-road stratum maintains the probability sampling characteristic of the design.
Cluster Sampling
Cluster sampling is another potentially cost-effective design for selecting reference
samples. In this design, the primary sampling unit (psu) consists of a cluster of pixels,
such as a 3x3 block or a linear row of pixels. In one-stage cluster sampling, all pixels
within the psu are sampled, whereas in tw-stage cluster sampling, a subsample of pixels
within each psu is selected. The advantage of cluster sampling is that the number of
pixels sampled per unit cost is increased because pixels are sampled in closer proximity.
Moisen et al. (1994) demonstrated the efficiency gains achievable by cluster sampling
based on an analysis taking into account sampling costs and the spatial autocorrelation
of classification errors. The pixels within a given sampled psu cannot be regarded as
independent observations, so the standard error formulas must reflect the cluster
structure of the sampling design (Czaplewski 1994, Moisen et al. 1994, Stehman 1996b).
The usual SRS variance estimators do not take into account the within cluster
correlation and will likely underestimate the cluster sampling variance.
Sampling Rare Land-Cover Classes
The emphasis placed on sampling rare land-cover categories strongly influences the
design selected for accuracy assessment. If rare classes are considered extremely
important, the sampling design should reflect this priority and practically forces using a
stratified design to ensure adequate sample sizes for these rare classes. At the other
extreme, rare classes may be assigned low priority, and an equal probability design is a
viable option. Rare classes will not be sampled in large numbers in such designs. An
intermediate emphasis on rare land-cover types requires a compromise design. Two
general strategies are offered.
The first strategy is to treat rare classes as a separate sampling problem. For
example, a general purpose sampling design for accuracy assessment could be a simple
random, systematic, or cluster sample. Because such designs will result in only a few
reference samples in the rare classes, the general design is supplemented by a specialized
design tailored to sample rare classes with high probability. This two-step design must
still be conducted according to a probability sampling protocol, and the analysis must
take into account that those pixels selected in the supplemental design may have
different inclusion probabilities than those pixels selected by the original design.
Adaptive cluster sampling (Thompson 1990) is tailored to sample efficiently rare,
but spatially clustered items. An example illustrating this strategy in an accuracy
assessment setting is as follows. Suppose the rare land-cover class is marsh, and assume
that a 5x5 block of pixels is adopted as the psu. A simple random or systematic sample
of psus is selected. For those psus in which at least one marsh pixel (according to the
reference classification) is found, the sampling procedure is "adapted" to then sample
adjacent 5x5 blocks surrounding the initial sampled psu. If one or more of these
adjacent psus is found to have at least one marsh pixel, the adaptive strategy is
continued. The process stops when no marsh pixels are found in adjacent psus. If the
rare cover-type is spatially clustered, this strategy will greatly improve the probability
of sampling reference pixels in this class because the method intensifies sampling effort
in those areas in which the rare class is found. The adaptive design satisfies the
necessary criteria of a probability sample. Special estimators of the accuracy parameters
need to be employed when this design is used (see Thompson (1990) for the basic
theory). Adaptive cluster sampling may be an effective design for change detection
accuracy assessment because while change pixels may be rare, they are likely to be
spatially clustered, and the adaptive design may more cost-effectively sample such
Double sampling is another design alternative that may be used to enhance
estimation for rare cover types. In double sampling, a large first-phase sample is
selected and the stratum to which each sampled pixel belongs is identified. From these
first-phase sample units, a stratified random sample, called the second-phase sample, is
then selected. The primary advantage of this method is that only the first-phase sample
units, not all N pixels, need to be assigned to strata. The stratification could be based
on the land-cover classes as identified by the reference data instead of stratifying on the
land-cover classes identified by the map. This stratification is then used to increase the
probability of sampling rare ground classes. To implement this design, a large firstphase sample is selected and each location is assigned to a land-cover class. These
assignments would not necessarily require a ground visit if reasonably accurate stratum
identifications can be made using available maps, aerial photographs or videography.
The second-phase sample is then a stratified random subsample of the first-phase
sample, and the "true" classification of these second-phase sample sites is identified.
To illustrate how this approach focuses sampling effort on rare categories, suppose a
particular class, say marsh, represents only 0.1% of the land area, and that commission
error for marsh is high. If the stratification is based on land-cover as identified by the
map, it is likely that only a few true marsh sites will appear in the reference sample. By
using double sampling and stratifying on the ground classification, the probability of
sampling marsh reference pixels could be increased dramatically by intensifying
sampling effort in the marsh stratum. Increasing the sample size in this manner will
improve precision of producer's accuracy estimates for the marsh class.
The second strategy for sampling rare classes is to adapt the general purpose
sampling design itself to emphasize more strongly rare classes. A stratified design is a
typical approach for such a one-step strategy. However, if the general design is chosen
to emphasize ~ampling~rare
classes, some precision will be lost for other estimates of
map accuracy such as PC and 2 because of this allocation of sampling resources. This
precision trade-off is characteristic of any sampling design choice, and the properties of
the strategy selected should reflect the objectives specified for the accuracy assessment.
The second component of a cost-effective sampling strategy is the analysis of the
data, provided by the estimators of map accuracy parameters. Three techniques for
improving the precision of map accuracy estimates at the analysis stage are poststratification, regression estimation, and incorporating "found" data into an accuracy
assessment. All three approaches use auxiliary information in the analysis to achieve
this precision gain. Poststratification and regression estimation can be applied to
practically any sampling design. Here it is assumed that SRS is employed because an
advantage of these analysis techniques is that they can improve precision for simple,
easily implemented sampling designs.
Poststratification is an estimation technique, not a sampling design, requiring
auxiliary information similar to that required for stratified sampling. Suppose n sample
pixels are obtained via SRS. Let Nk+ and nk+ denote the number of pixels classified as
cover type k in the population and sample, respectively. Both Nk+ and nk+ are known
once the map is completed. Post stratification incorporates the known totals Nk+ into
the estimator by analyzing the SRS data as a stratified random sample in which nk+
pixels have been selected from the Nk+ pixels available in that stratum (cover type k as
identified on the map). Consider the estimator of PC. If nkk is the number of sample
pixels correctly classified in cover type k, the usual SRS estimator for PC is C n k k / n ,
where summation is over the q cover types. For SRS, each pixel is weighted equally,
the weights being N/n. For the poststratified estimator, the weights are dependent on
the identified strata, the weight for stratum k being Nk+/nkf. Then the poststratified
estimator of PC is (I/N) C l=l(Nk+/nk+)nkk. Poststratified estimators for other
parameters are constructed in essentially the same manner, replacing the weight N/n
used in the usual SRS formula by Nk+/nk+, and then summing over the q strata. Card
(1982) presents poststratified estimators and variance estimators for some accuracy
parameters, and Stehman (1995) shows the poststratified estimators of PC and n. The
precision achieved by post stratification is approximately that of a stratified sample with
proportional allocation (Cochran 1977, p. 134), so poststratification will usually result in
some gain in precision over the usual SRS estimators. Based on a small simulation
study (Stehman 1996c), the gain in precision from poststratification can be expected to
be around 5% (in terms of standard error) for estimating PC and n, and as much as 1530% for estimating producer's accuracy.
A disadvantage of poststratification relative to a stratified design is that the sample
size within each stratum, nk+, is under control with a stratified design, but not with
SRS followed by poststratification. However, poststratification can be applied to any
classification scheme. For example, if a user wishes to collapse land-cover classes or use
an entirely different classification scheme, poststratification can still be applied.
Poststratification is flexible because it can be adapted to any identified subpopulation,
whereas the usual st ratified sampling design is advantageous only for those
subpopulations identified as strata prior to sampling. Poststratification and stratified
sampling both require Nk+, but these are available once the land-cover map is
completed. Because poststratification does not require this information to obtain the
sample, no time delay between the imagery and the ground sampling need occur.
In the regression estimator approach, any auxiliary information available that can
provide a land-cover classification of 'a pixel may be used. Primary sources of auxiliary
data include aerial photography or videography, or even another land-cover map
obtained via remote sensing and using a coarser scale of resolution (e.g., AVHRR). This
auxiliary information is separate from the imagery data used to construct the target
land-cover map being assessed. A sample of ground reference locations is still required,
and these reference data are combined with the auxiliary data via a regression estimator
to estimate PC. For SRS of n pixels, the sample proportion of pixels in which the
reference and map classifications match is
If no auxiliary data are available,
the usual estimator of PC. The sample proportion of pixels in which the map classiis the
fication matches the classification obtained by the auxiliary data is
sample proportion of pixels in which both the reference and auxiliary data classilcations
where P, is the proportion of pixels in the entire map in which the auxiliary data and
map classifications match. The estimated variance of
The regression estimator results in some gain in precision over the usual SRS
estimator, no matter how poor the auxiliary data classifications, but the approach is not
worth implementing unless a meaningful gain in precision is achieved. The more
accurate the classifications from the auxiliary data, the greater the gain in precision
achieved by the regression estimator relative to the usual SRS estimator.
The usual regression estimator approach requires auxiliary information for the entire
map region. In practice, it is more feasible to use a double sampling approach in which
the auxiliary data are collected for a first-phase sample, and ground visits are made to a
subsample of the first-phase sample. The double sampling regression estimator incorporates both the first- and second-phase sample data. Stehman (1996a) demonstrated
that the precision of the regression estimator combined with double sampling was nearly
the same as the strategy employing the regression estimator with SRS. The double
sampling strategy is more cost-effective because it requires obtaining the auxiliary data
only on the first-phase sample, not the entire target region.
Found Data
Additional potential ground reference data may be available via purposeful,
haphazard, convenience, or other non-probability sampling methods. For example,
suppose an organization has land-cover classifications for various special interest sites
which were visited "to see what was there." Assuming the land-cover classifications
made at these sites are correct and consistent with the classification scheme employed
for the map being evaluated, is it possible to use these data in the accuracy assessment?
Such data in a sense represent "free" reference samples because they already exist. But
incorporating them into the accuracy assessment in a statistically valid manner is not
simple because it is difficult to generalize from these data to the population a t large.
That is, what population do these "found" sites represent? To illustrate the nature of
the difficulty, if reference samples were obtained only from areas within, say, 250 meters
of a road, the sample data can be generalized only to a population of the area in the
study region within 250 meters of a road. Generalizing beyond that area is not
supported by statistical inferences, and must rest on non-statistical arguments regarding
what these sites represent. The same can be said of "found* sites. Because these sites
were not selected by probability sampling methods, the population represented by these
sites is unknown. The probability sampling protocol must be replaced by assumptions
concerning the population represented by the found data. Overton (1990) and Overton
et al. (1993) describe some statistical procedures for using found data that apply to this
problem. However, the amount of work (cost) needed to use these data in a statistically
valid manner appears high, and unless these data are abundant or exceedingly valuable,
such as for a very rare cover-type, it is doubtful that such data can contribute
significantly to reducing the costs of a statistically rigorous accuracy assessment.
Incorporating existing data is much more feasible if a valid sampling design was used to
obtain the data.
Some general recommendations for sampling in large-scale accuracy assessment
projects are proposed. As with any general recommendation, it is easy to think of
situations in which exceptions would obviously be needed. Recommendations should
also evolve over time as better methods and new insights are gained. With those
caveats in mind, the following suggestions are proposed. For accuracy assessment of
large-area land-cover maps, stratification by a few large geographic regions is helpful to
ensure a regionally representative sample, to provide adequate sample sizes in these
regions for reporting region specific accuracy statistics, and to spread the workload more
evenly. As an example of the scale of the recommended stratification, a state such as
North Carolina could be stratified into three physiographic regions, the coastal plain,
Piedmont, and western upland areas. Bauer et al. (1994) provide another example, as
they partitioned their study region (Minnesota) into eight physiographic regions that
States themselves are
could be used as strata in an accuracy assessment.
administratively convenient strata for a land-cover map spanning several states.
Within each geographic stratum, a simple design such as simple random or
systematic sampling is recommended as the general design. Implementing a simple
design creates the opportunity to employ precision enhancing analysis techniques such as
poststratification and regression estimation. Cluster sampling has been demonstrated to
be cost-effective (Moisen et al. 1994), and the combination of cluster sampling with
stratification into road and non-road areas (Edwards et al. 1996) has considerable
practical appeal. Simple designs are easier to implement in the field thus increasing the
likelihood that the design will be implemented correctly. This general design strategy is
adaptable to a variety of analyses and classification systems, and will accommodate the
multiple general uses and objectives present in a large-area mapping project.
To accomplish more specialized objectives, a design tailored to these objectives can
supplement the general purpose sampling design. For example, if sampling certain rare
classes is an important objective, the data from the general design can be supplemented
by an additional simple random sample from each rare class. Users must be careful to
incorporate proper weights in the analysis when combining data collected by these two
different designs. The double sampling and adaptive cluster sampling designs described
earlier may also be used to supplement a simpler, general purpose design.
Accuracy assessment of large-area thematic maps creates several challenging
statistical problems. These problems, however, are not insurmountable. The classical
finite sampling approach has been in use for over 50 years and has been applied to
sampling large, spatially disperse, and hard to measure populations. National samples
obtaining labor, economic, and health statistics are applied to populations over large
geographic areas, and these programs face a demanding set of objectives requiring
estimates for a variety of parameters at various spatial scales. These problems are
characteristic of large-area accuracy assessment efforts. Classical finite sampling theory
and methods provide a rich toolbox from which to choose cost-effective sampling
strategies, and adapting these strategies to accuracy assessment of large-area thematic
maps is a viable approach to pursue.
I thank Ray Czaplewski for his review and helpful suggestions. This work has been
supported by cooperative agreement CR821782 between the U.S. EPA and SUNY-ESF.
Bauer, M.E. et al. 1994. Satellite inventory of Minnesota forest resources. Photogram.
Eng. tY Remote Sensing 60: 287-298.
Card, D. H. 1982. Using known map category marginal frequencies to improve estimates
of thematic map accuracy. Photogram. Eng. tY Remote Sensing 48: 431-439.
Cochran, W.G. 1977. Sampling Techniques (3rd ed). Wiley: New York.
Czaplewski, R.L. 1994. Variance approximations for assessments of classification
accuracy. Res. Pap. RM-316. Fort Collins, CO: U.S. Department of Agriculture,
Forest Service, Rocky Mountain Forest Range and Experiment Station. 29p.
Edwards, T.C., Jr., Moisen, G.G., and Cutler, D.R. 1996. Assessing map accuracy in an
ecoregion-scale cover-map ( i n review).
Janssen, L.L.F., and van der Wel, F.J.M. 1994. Accuracy assessment of satellite derived
land-cover data: A review. Photogram. Eng. tY Remote Sensing 60: 419-426.
Moisen, G.G., Edwards, Jr., T.C., and Cutler, D.R. 1994. Spatial sampling to assess
classification accuracy of remotely sensed data. In Environment a1 Informat ion
Management and Analysis: Ecosystem to Global Scales, W.K. Michener, J.W.
Brunt, and S.G. Stafford (eds). New York: Taylor and Francis.
Overton, W.S. 1990. A strategy for use of found samples in a rigorous monitoring
design. Tech. Rep. 139, Dept. of Statistics, Oregon State University, Corvallis, OR.
Overton, J.McC., Young, T.C., and Overton, W.S. 1993. Using 'found' data to augment
a probability sample: procedure and case study. Envir. Monit. and Assmt. 26: 65-83.
Stehman, S.V. 1995. Thematic map accuracy assessment from the perspective of finite
population sampling, Inter. J. of Remote Sensing 16: 589-593.
Stehman, S.V. 1996a. Use of auxiliary data to improve the precision of estimators of
thematic map accuracy (Remote Sensing of Environment, in review).
Stehman, S.V. 1996b. Estimating standard errors of accuracy assessment statistics under
cluster sampling (Remote Sensing of Environment, in review).
Stehman, S.V. 1996c. Sampling design and analysis issues for thematic map accuracy
assessment. ASPRS and A CSM Annual Proceedings (to appear).
Thompson, S.K. 1990. Adaptive cluster sampling. J. Amer. Stat. Assoc. 85: 1050-1059.
Stephen Stehman is an Associate Professor at SUNY-ESF. He has a B.S. in Biology
from Penn State, an M.S. in Statistics from Oregon State, and a Ph.D. in Statistics
from Cornell. He provides statistical consulting for faculty and graduate students at
ESF and teaches courses in sampling, experimental design, and multivariate statistics.