Download Graphical interpretation of analytical data from comparison of a field

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Determination of equilibrium constants wikipedia , lookup

Transcript
Clinical Chemistry 43:11
2039 –2046 (1997)
Opinion
Graphical interpretation of analytical data from
comparison of a field method with a Reference
Method by use of difference plots
Per Hyltoft Petersen,1* Dietmar Stöckl,2 Ole Blaabjerg,1 Bent Pedersen,1
Erling Birkemose,1 Linda Thienpont,2 Jens Flensted Lassen,3 and Jens Kjeldsen4
Various viewpoints have been offered regarding the
appropriate use of scatter plots or difference plots (bias
plots or residual plots) in comparing analytical methods.
In many of these discussions it seems the basic concepts
of identity (within inherent imprecision) and acceptability based on analytical goals (analytical quality
specifications) have been forgotten. With the increasing
number of Reference Methods in laboratory medicine,
these basic concepts are becoming more important in
validation of field methods. Here we describe a simple
and effective graphical test of these hypotheses (identity
and acceptability) by use of difference plots. These plots
display the underlying hypothesis before the measured
differences are plotted and allow interpretation of the
results according to specific criteria. We further describe
simple but effective interpretations of the data, when
the hypothesis is not fulfilled, by using two data sets
drawn from comparisons of field methods for S-creatinine with a Reference Method for this analyte. The
difference plot is a graphical tool with related simple
statistics for comparison of a field method with a
Reference Method, focusing on (a) identity within the
inherent analytical imprecision or (b) acceptability
within analytical quality specifications. Calculation of
the standard deviation of the differences is an indispensable tool for evaluation of aberrant-sample bias
(matrix effects).
Departments of 1 Clinical Chemistry and 4 Medical Gastroenterology,
Odense University Hospital, DK-5000 Odense C, Denmark.
2
Laboratorium voor Analytische Chemie, Faculteit Farmaceutische
Wetenschappen, Universiteit Gent, Harelbekestraat 72, B-9000 Gent, Belgium.
3
Department of Clinical Chemistry, Vejle Sygehus, DK-7100 Vejle, Denmark.
*Author for correspondence. Fax 1 45 65 41 19 11; e-mail phy@imbmed.
ou.dk.
Received December 10, 1996; revision accepted June 17, 1997.
Westgard et al. [1–3] outlined the basic principles for
method comparison in a clear, easy to follow manual.
They also introduced the concept of allowable analytical
error and gave an overview of published performance
criteria. They recommended that the estimated analytical
imprecision and bias be compared with these performance criteria in method evaluation as well as in method
comparison. Their approach made use of a scatter-plot
and calculations based on regression lines, but with
confidence limits and judgment of acceptability based on
the criteria for allowable analytical error.
These principles of comparing analytical performance
with performance criteria, however, have not been universally accepted, and recent publications have criticized
the misuse of correlation coefficients [4] and overinterpretation of regression lines in method comparison [5–7].
Bland and Altman [4] recommended the difference plot
(or bias plot or residual plot) as an alternative approach
for method comparison. On the abscissa they used the
mean value of the methods to be compared, to avoid
regression towards the mean, and on the ordinate they
plotted the calculated difference between measurements
by the two methods. They further estimated the mean and
standard deviation of differences and displayed horizontal lines for the mean and for 62 3 the standard deviation. However, they missed the concept of a more objective criterion for acceptability. Recently, Hollis [5] has
recommended difference plots as the only acceptable
method for method comparison studies for publication in
Annals of Clinical Biochemistry, but without specifying
criteria for acceptability.
However, a few difference plots with evaluation of
acceptability according to defined criteria have been published, e.g., in evaluation of estimated biological variation
compared with analytical imprecision [8], and in external
quality assessment of plasma proteins for the possibilities
of sharing common reference intervals [9].
Maybe the scarcity of such publications is more a
2039
2040
Hyltoft Petersen et al.: Interpretation of analytical data by use of difference plots
question of interpretation of the data by plotting than a
strict choice between scatter-plot and difference plot, as
discussed by Stöckl [10] recently. Investigators seem to
rely too much on regression lines and r-values, without
doing the equally important interpretation of the data
points of the plot. This is becoming more and more
disadvantageous with the increasing number of Reference
Methods available for comparison with field methods,
because in these cases, it is not a question of finding some
relationships, but simply of judging the field method to be
acceptable or not.
NCCLS has recently published guidelines for method
comparison and bias estimation by using patients’ samples [11], where both scatter-plots and bias plots are
advised. The document also recommends plotting of
single determinations as mean values and stresses the
need of visual inspection of data. Further, comparison
with performance criteria is recommended, but these
criteria are not specified and they are not used in the
graphical interpretation. Recently, Houbouyan et al. [12]
used ratio plots in their validation protocol of analytical
hemostasis systems, where they used a preset, but arbitrarily chosen, acceptance limit of inaccuracy of 15%.
In the following, we will use the difference plot (or bias
plot) in combination with simple statistics for the principal judgment of the identity or acceptability of a field
method. The difference plot makes it easier to apply the
concept; in principle, however, the same evaluations
could be performed for a scatter-plot in relation to the line
of identity (y 5 x).
The aim of this contribution is to pay attention to the
hypothesis of identity and the concept of acceptable
analytical quality in method comparison, especially when
one of the methods is a Reference Method.
basic considerations
The basis for comparing an analytical field method with
an analytical Reference Method measuring the same
quantity (analyte) is the hypothesis of identity within
inherent imprecision or within preset analytical quality
specifications. The field method, whether a kit or a
self-produced analytical procedure, is applied for routine
analyses in laboratory medicine. The method must be
demonstrated to have adequate accuracy (trueness), precision, and specificity (lack of aberrant-sample bias [13])
for its intended use. The Reference Method, in contrast, is
a “thoroughly investigated method, clearly and exactly
describing the necessary conditions for procedures, for
the measurement of one or more property values that has
been shown to have accuracy and precision commensurate with its intended use and that can therefore be used
to assess the accuracy of other methods for the same
measurement, particularly in permitting the characterization of a reference material” [14].
Further characteristics of the methods are the costs, the
complexity, the equipment, the time used for production
of the results, and so forth—the field method generally
being cheap and well suited for routine work and the
Reference Method usually being applicable only in few
competent and specially equipped laboratories.
The two approaches to the comparison are:
1. Identity. The results from the field method do not
deviate from the Reference Method results by more than
the inherent imprecision of both methods.
2. Analytical quality specifications. The results from
the field method do not deviate from the Reference
Method results by more than the acceptance specifications
do from the analytical goals.
In both cases the null hypothesis is that the measured
differences for all samples are zero. In the first case, the
acceptance limits are defined by the inherent analytical
imprecisions, and in the latter case the acceptance limits
are defined by the analytical quality specifications.
By assuming the ideal situation, the theoretical limits
for testing the null hypothesis can be drawn in a difference plot, i.e., a mean difference [mean(d)] equal to 0, and
and a standard deviation of differences [s(d)] calculated
from the imprecisions of the two methods. The ultimate
hypothesis, then, is that the points are distributed within
these bounds. If they fall outside of these bounds, the
hypothesis is rejected. Alternatively, the performance
characteristics are tested against defined analytical quality
specifications.
acceptance limits defined by inherent
analytical imprecision
Constant analytical standard deviations are presumed. A
number of patients’ samples are used for the comparison
of the field method with the Reference Method. The result
of the ith patient sample by the two methods is xiF and xiR
for field and Reference Method, respectively, and the
difference, xid, is xiF 2 xiR. Further, the theoretical variance
of the differences is the sum of the two variances denoting
the inherent imprecision of field and Reference Methods:
s2(d) 5 s2F 1 s2R (the theoretical s values for the two
methods being estimated independently for the two
methods of the comparison, or estimated from replicate
measurements during the comparison). When means of
duplicates are used, then s should be divided by =2.
When the two methods are identical, the expectation is
that ;68% of differences will be distributed symmetrically around 0 within 0 6 1s(d), and 95% will be within
0 6 1.96s(d). If the distribution of differences fits these
criteria, then it is not possible to find any difference
between results from the two methods within the inherent
imprecision. If this is not the case, then the methods are
not identical.
Before the measured points are plotted on any figure,
the hypothesis can be illustrated in a difference plot, as
shown in Fig. 1A for the example of S-creatinine. The
horizontal line y 5 0 illustrates the hypothesis of xid 5 0;
the other lines indicate within 0 6 1s(d) for 68% and
within 0 6 1.96s(d) for 95% of the differences, respectively. The outer lines indicate the 95% prediction interval
Clinical Chemistry 43, No. 11, 1997
2041
Fig. 1. Acceptance limits defined
by inherent analytical imprecision:
difference plots for S-creatinine,
with Reference Method (REF) values as abscissa and the difference between field method (FIELD)
and REF as ordinate, assuming
constant s.
(A) Expected distribution of 95% of the
measured points, 0 6 2s(d) 5 66.3
mmol/L (95% prediction interval); the
lines for the expected distribution of
68% of the points are also indicated,
0 6 1s(d). (B) Same as A but with
addition of the simulated “measured”
points (simulated from a gaussian distribution with mean of 20.5 and s 5
3.0 mmol/L). (C) Same as B but with
indication of the statistical 95% tolerance interval with 95% confidence, 0 6
2.69s(d). (D) Same as B and with calculated lines according to Bland and
Altman [2], indicating the calculated
mean(d) 6 2s(d), mean(d) 5 20.84,
and s(d) 5 3.27 mmol/L.
for the expected distribution of points. In this example, sF
5 3.10 and sR 5 0.50 and, therefore, s(d) 5 3.15 mmol/L.
It is educational to describe the hypothesis before
plotting the points, because most investigators will start
interpretation of the points in the form of functional
relationships and thereby forget about the hypothesis.
With the hypothesis in mind, one can now plot the data
points as shown in Fig. 1B. The data points are generated
for Reference Method values between 50 and 150 mmol/L
and computer-simulated gaussian-distributed differences
based on mean(d) 5 20.5 mmol/L and s(d) 5 3.00
mmol/L; from these simulated data the calculated
mean(d) was 20.84 mmol/L and s(d) was 3.27 mmol/L.
The data points are distributed roughly according to the
hypothesis, with 16 points (70%) within the 0 6 1s(d) and
21 points (91%) within 0 6 2s(d), leaving 2 points (9%)
outside the limits; because these two points seem not to
deviate too much from the general distribution, the conclusion could be that the finding of just 2 points outside
(and close to) the 95% prediction interval is expected and
acceptable and, therefore, that the field method is indistinguishable from the Reference Method within the analytical imprecision, so the evaluation can stop. Statistically, the mean difference can be evaluated by a t-test and
the distribution of differences by an F-test.
This approach is very narrow, and some uncertainty
related to unknown factors may be taken into account.
Such variations could be the variation between the two
tubes/vials with serum from the same individual or the
underestimation of sF. Therefore, possible additional
sources of uncertainty always should be taken into account when appropriate. However, the design of a
method comparison has to be carefully planned so as to
exclude additional uncertainties. Further, any addition of
“acceptable” uncertainty should be well thought through
and handled with caution.
For the experienced scientist, interpreting the difference plot is easy. A more objective criterion, however, for
graphical validation of the distribution of points, and
especially the more extreme points, is to apply the concept
of tolerance intervals, where the standard deviate (z or c)
is substituted for by a tolerance factor, k, with a value
dependent on the percentage of points (here 95%) and the
confidence with which this percentage should be obtained. The k-value is determined by the assumptions
about the new distribution, whether the mean or the
standard deviation is unknown, or both. The k-value
further depends on the number of points, n [15, 16].
Although we have a hypothesis about mean difference
and standard deviation, these figures are unknown in
2042
Hyltoft Petersen et al.: Interpretation of analytical data by use of difference plots
practice, so the k7 for unknown mean and standard
deviation may be the most relevant to use. For n 5 23 and
95% confidence for 95% of the points, the tolerance factor,
k7, is 2.67 and the tolerance interval is 0 6 2.67s. This is
illustrated in Fig. 1C, where all points are distributed
within the chosen tolerance interval.
The present approach of theoretical expected distribution is compared with the approach of Bland and Altman
[4] in Fig. 1D by inserting the new lines determined by
mean(d) 6 2 s(d) estimated from the data points.
The difference between the present concept and the
Bland and Altman concept is clear from Fig. 1D. Accordingly, we illustrate the 95% prediction interval before any
data points are applied, in contrast to Bland and Altman,
who simply illustrate the statistics of the points. In this
example the mean values are clearly different, whereas
the standard deviations are rather close to each other.
To illustrate the relations to x–y plots, we first add to
Fig. 1B 11 points (triangles), as shown in Fig. 2(top). The
specimens producing these points are assumed to contain
some “nonspecific” components [in the S-creatinine ex-
Fig. 2. Comparison of difference plot with x–y plot for two sets of
identifiable data points: (F) same as in Fig. 1 and (Œ) a new set with
mean difference 5 4.7 and standard deviation of differences 5 2.8
mmol/L.
(Top) Difference plot (as in Fig. 1B); (bottom) x–y plot showing the line of identity
(y 5 x).
ample, perhaps the specimens are from diabetics, where
(e.g.) glucose could result in nonspecific reactions by the
Jaffe methods]. In the difference plot they separate clearly
from the other points and the difference and standard
deviation change to 1 0.16 and 4.08 mmol/L, respectively.
In the x–y plot (Fig. 2, bottom), where the points should be
related to the line of identity (y 5 x), it is difficult to see
the difference between the two sets of points. If we turn to
calculation of r-values, r decreases from 0.993 to 0.991, the
slope of the regression line changes from 1.020 to 1.025,
and the intercept changes from 22.54 to 21.17 mmol/L.
The information from difference plots and x–y plots
(when using the line of identity) is the same, but it is
easier to expand the differences in the difference plot and
the calculations of variances are simpler, whereas the 45°
angle in the x–y plot makes comparable calculations more
difficult. This is also seen from Westgard et al. [2, 3],
where the simple calculations of (e.g.) bias is easier to
interpret in combination with figures.
acceptance limits defined by goals for
analytical quality
A more relevant approach for comparing a field method
with a Reference Method is to use the analytical goals
(analytical quality specifications) as acceptance limits.
These specifications may be related to the clinical use of
laboratory data [17, 18] or more generally to the application of common reference intervals [19, 20] and monitoring patients [21, 22]. Two European groups, one under the
auspices of EGE-Lab (European Group for the Evaluation
of Reagents and Analytical Systems in Laboratory Medicine) [23, 24], and another group under the auspices of
European EQA-Organizers (External Quality Assessment
Organizers) [25], have given recommendations for analytical quality specifications based on the same biological
concepts and with identical criteria for acceptable analytical bias and imprecision, but with a different concept for
combining these—the first (EGE-Lab) defining the recommendations for analytical bias and imprecision separately
and the latter (EQA-Organizers) combining these two
aspects. In both European recommendations, the analytical quality specification for imprecision is CVanalytical #0.5
CVwithin-subject variation as proposed by Cotlove et al. [21],
and that for analytical bias is uBanalytical u #0.25 CVtotal
biological variation [19]. The EGE-Lab concept accepts both a
maximum bias and a maximum imprecision simultaneously; the EQA-Organizer concept describes a functional relationship between the two in the form of a
maximum allowable combination of imprecision and bias.
The latter is close to the original concept of Gowans et
al. [19], defining the acceptable analytical percentage bias
for S-creatinine as 2.8% [25] when imprecision is negligible. According to the EGE-Lab concept [23, 24], however,
both a bias of 2.8% and an imprecision of 2.2% are
acceptable simultaneously. This means that for single
determinations 0 6 ( bias 1 1.65 3 imprecision) is
acceptable [26]; i.e., 95% of the single points must lie
Clinical Chemistry 43, No. 11, 1997
within the limits of 0 6 (2.8% 1 1.65 3 2.2%) 5 0 6 6.4%,
as illustrated in Fig. 3 (left). This criterion is fulfilled as
shown in Fig. 3 (middle). The concept, however, is onesided, i.e., is valid only in one direction. Therefore, the
standard deviation of differences should be judged
against the imprecision criterion separately. The standard
deviation of the field method is 3.1 mmol/L and the CV 5
3.9%, which exceeds the imprecision specification of 2.2%.
Figure 3 (right) illustrates an example of acceptable
analytical imprecision and bias. The CV is 2.1% and the
estimated bias (mean difference) is 11.3 mmol/L (95%
confidence interval, 0.6 –2.0 mmol/L). Note that the mean
difference is different from 0 but is acceptable according
to the criterion.
For the purpose of method comparison, the value for
maximum allowable bias might be expanded because of
the uncertainty (confidence interval) of the Reference
Method. This cannot be seen from the actual comparison,
but because the Reference Method is allowed to have
some uncertainty, then this must be allowed also for the
field method. We propose using the factor 1.2, in light of
a recent concept that requires for Reference Methods a
total error of ,0.2 times that of the routine method [27]. In
the present case, the acceptance limits for bias should thus
be 0 6 3.36%, as also used in the example below.
two examples utilizing comparison data
The data used are from a paper on a candidate reference
method for determining S-creatinine, which was used for
comparison of four field methods [28]. Data from two of
these comparisons are used, but only for concentration
values ,150 mmol/L, the only region in which there are
sufficient data for our presentation.
Practical example 1. Based on the duplicate analyses performed, analytical imprecision is calculated within the
interval 50 –150 mmol/L for both the Reference Method
(HPLC) and the field method, giving sR 5 0.568 mmol/L
and sF 5 0.791 mmol/L. When means of duplicates are
used for the comparison, the calculated theoretical s(d)
should be divided by =2, giving an expected distribution
2043
of 95% (the 95% prediction interval) of the points within
0 6 2s(d) 5 0 6 1.4 mmol/L. Further, the expanded
allowable bias of 3.36% is used for the analytical quality
specifications according to both EGE-Lab and EQA-Organizers concepts.
The graphical evaluations are shown in Fig. 4(top).
Here, the calculated distribution is considerably broader
than the one assumed from the analytical point of view,
whereas the measured mean bias is within the limits of
acceptance of the bias because the mean (21.6) and 95%
confidence interval (20.5 to 22.7 mmol/L) are within the
3.36% allowable bias. Further, the points are distributed
within the total EGE-Lab criteria with only one real
outlier. The difference between the present concept of 95%
prediction interval and the Bland and Altman description
of the actual points is clear from the Figure.
At first glance, the field method should be acceptable,
with CV 5 1% and bias ,3.36%, but the distribution of
points (Fig. 4, top) reveals a much broader distribution
(CV 5 5%), which emphasizes an unknown uncertainty.
The problem is further underlined by the adding of single
determinations to the measurements in the Figure, illustrating the reproducibility of the individual differences.
This uncertainty is close to 5% and may originate not only
from vial-to-vial variation but also from aberrant-sample
bias, whether from nonspecific reactions or interference in
the field method. Thus the difference plot and calculation
of the standard deviation of differences are tools to
disclose aberrant-sample bias in field methods.
Fuentes-Arderiu and Fraser [30] have proposed that
the combined effects of imprecision and interference
should be used in the concept of specifications for imprecision, but the problem has not been dealt with in either
the EGE-Lab or EQA-Organizer concepts.
Practical example 2. In the other example (Fig. 4, bottom),
all points are displaced from the acceptance area, showing
a considerable mean bias (;20 mmol/L) and also considerable uncertainty from aberrant-sample bias. The calcu-
Fig. 3. Acceptance limits defined by goals for analytical quality specifications: difference plot.
(Left) Acceptance limits for maximum allowable analytical bias of 0 6 2.8% are shown according to the EQA-Organizers concept (see text). Lines for maximum combined
distribution according to the EGE-Lab concept, 6(bias 1 1.65 CV) 5 66.4%, are also shown (see text). (Middle) Same as left panel but with the simulated difference
points (same as in Fig. 1). The mean value is indicated with 95% confidence interval (f). (Right) Another set of simulated data fulfilling the EGE-Lab criteria (see text).
2044
Hyltoft Petersen et al.: Interpretation of analytical data by use of difference plots
Fig. 4. Examples from data of comparison of field methods with a
Reference Method (HPLC): difference plots (note differences in scales
of ordinates).
Practical example 1: The different acceptance limits are shown, according to
inherent analytical imprecision, 0 6 2s (0 6 1.4) mmol/L; acceptable bias, but
expanded to 63.36% (see text); and the lines for maximum combined distribution according to the EGE-Lab concept, 6(bias 1 1.65 CV) 5 67.0% (see text).
Further, the measured points are applied together with the calculated distribution: mean(d) 6 2s(d), mean(d) 5 21.6 mmol/L, s(d) 5 3.95 mmol/L. The single
measured points from the duplicates of the field method (x) are applied. Practical
example 2: Acceptance limits for the EGE-Lab concept, as well as the calculated
distribution, mean(d) 6 2s(d). The upper limit according to the CLIA criteria is
also shown (see text).
lated imprecision, based on assays of duplicates, was
;1%, but the difference plot reveals the errors clearly.
The CLIA criteria are much wider than the European
recommendations, but as total error specifications are
easy to apply in the plot. For concentrations ,175 mmol/L
the acceptable deviation is 626.5 mmol/L (60.3 mg/L),
615% at higher concentrations. The upper line of the
CLIA criterion is shown in Fig. 4 (bottom). Because 7 of
the 54 points are outside the line, the method is also
unacceptable from the standpoint of proficiency testing.
often been mixed, as pointed out recently by Stöckl [10].
Stöckl emphasizes that graphical presentation of data (x–y
plot or difference plot) should not be mixed with statistical interpretation of data.
As long as the purpose is to find the best functional
relationship between two methods in order to correct one
with the other, then an x–y plot and calculation of the
regression line with an estimation of the scatter via syux
may be most relevant, with visual inspection of the scatter
and a residual plot. The present task, however, is not to
find a functional relationship between two methods, but
to judge a field method in relation to a Reference Method.
When a field method is compared with a Reference
Method for acceptability according to certain criteria,
whether analytical or biological, then the visual inspection of all data is essential. Whether one uses a simple x–y
plot or a difference plot is not critical, as long as the area
of interest is expanded and the single points are assessed
according to the hypothesis of identity between the two
methods. The hypothesis is that the measured values are
identical (and not that they are unrelated, which is the
basic hypothesis for correlation studies), which means
that the hypothesis is described in an x–y plot as the line
of identity (y 5 x) and in the difference plot as the line y 5
0. When a ratio plot is more appropriate than a difference
plot, e.g., when analytical CV is close to being constant,
the same evaluations can be performed.
The advantages of difference plots (and ratio plots) are
keeping the hypothesis of identity in mind and the ease of
expanding the difference ordinate according to the investigator’s purpose. The power of the graphical illustration
in Figs. 1A and 3 (left) lies in the simplicity and the clear
definition of the hypotheses.
For the experienced interpreter, most situations can be
evaluated by visual inspection of the plots, whether a
difference plot or an x–y plot. If more objective criteria are
wanted, calculations of mean difference with confidence
intervals are a powerful tool, as is a table of k7-values5 for
estimation of tolerance intervals.
Most important, however, is the inspection of the
distribution of the difference points, especially when
samples expected to have matrix effects are marked, and
calculation of the standard deviation of differences. When
this exceeds the estimated analytical imprecision, it is an
indication of aberrant-sample bias. In principle, the rvalue from correlation between x and y reflects this, but in
practice the r-value is insensitive, as illustrated in Fig. 2.
Further, calculation of the standard deviation of differences gives an estimate of aberrant-sample bias compared
with the theoretical imprecision. The same information
can be obtained from the syux estimate from regression
analysis.
general discussion
In the current discussion of difference plot vs x–y plots
and the application of regression lines or functions, evaluations of data (vis-à-vis best relationship and comparison of a field method with a Reference Method) have
5
Some k7-values for 95% tolerance factors for the 95% interval: 3.38 for n 5
10; 2.75 for n 5 20; 2.55 for n 5 30; 2.38 for n 5 50; 2.30 for n 5 70; and 2.23
for n 5 100.
2045
Clinical Chemistry 43, No. 11, 1997
The plotting of single determinations can give information about imprecision and aberrant-sample bias as
well and, if needed, a functional curve can be calculated
and drawn. In this context we mention that replicate
measurements always should be performed in this type of
comparison, and that specimens should be stored for
evaluation of possible outliers.
Krouwer and Monti [29] presented a graphical method
for evaluation of laboratory assays (a mountain plot).
They computed the percentile for each ranked difference
between the two methods, and by “turning” at the 50th
percentile produced a histogram-like function (the mountain). This method is relevant for detecting large infrequent errors (differences) but lacks the aspect of concentration relationship. These investigators, therefore,
recommend use of their plot together with difference
plots. Introduction of analytical quality specifications in
the mountain plots may be useful in method evaluations.
Bland and Altman have pointed to the presentation of
difference plots, where they recommend mean values
of both methods on the abscissa [4]; however, the risk of
regression towards the mean is negligible in studies
where field methods are compared with Reference Methods, because the results from the Reference Method (used
on the abscissa) are assumed to have negligible error.
They further calculate and present the standard deviation
of the measured data, which is relevant information but
not related to the hypothesis of identity.
This form of graphical testing of the hypothesis of
identity has been used for biological data [8], where the
standard deviation of measured differences was compared with the analytical imprecision. A more stringent
method of testing measured values from field methods
against target values has been applied for plasma proteins
in The Nordic Protein Project [9]. Here, the target values
of serum pools were assigned from the European Community Bureau of Certified Reference Material, CRM 470
[31], by the method recommended by IFCC [32], and were
used for the abscissa in a plot with acceptance lines
according to acceptable bias [9] and with the measurements illustrated by the mean difference with a 90%
confidence interval.
Another graphical method of prediction of single measurements of clinical data has been published for serial
measurements of International Normalized Ratios of prothrombin times [33]. Here, the differences between consecutive measurements in patients were compared with
the expected variation estimated from patients under
steady-state conditions. The abscissa was used for the
latest result, resulting in a regression towards the mean,
which was considered acceptable for the purpose (i.e., not
to investigate correlations). The usefulness of the nomogram was improved by adding vertical lines for the
therapeutic interval.
The goals used for acceptance are relevant for the use
of common reference intervals [19, 20] and have been
recommended by two European groups [23–25], with
different consequences for the acceptance. This problem is
related to the phenomenon of aberrant-sample bias (matrix effects) and has not been fully clarified by the proposed goal from either EGE-Lab or EQA-Organizers;
Fuentes-Arderiu and Fraser [30], however, have proposed
that the aberrant-sample bias be included in the precision
goal [30]. Other analytical goals have been postulated
[17, 18, 34] and may be relevant for other evaluations.
Goals based on biological data, however, are general in
nature (related to common reference intervals and monitoring of patients) and are not restricted to a specific
clinical application.
The CLIA criteria for S-creatinine are 60.3 mg/dL (3
g/L, or 26.5 mmol/L) or 615% for concentrations .175
mmol/L [28]. These are criteria for total error, but they are
very wide compared with the European recommendations. The CLIA criteria are intended for use with proficiency testing (and therefore need to be wide), whereas
the European recommendations are so-called educational
criteria, which relate directly to the desirable performance
criteria for optimum monitoring of patients and the
sharing of common reference intervals within geographical areas with populations that are homogeneous for the
quantity of analyte. The criteria proposed by Ehrmeyer et
al. [35] for minimum intralaboratory performance characteristics to pass CLIA—CV ,33% and bias ,20%, proficiency testing criteria— can be applied as well as the
European criteria, but for the latter the total error will be
determining. The CLIA criteria may be applied even
better in difference plots, given the total error concept.
The validity of the analytical conclusions of this type of
evaluation relies on the Reference Method chosen. It must
be correct and specific and so forth, with negligible
imprecision; otherwise, the conclusions of the comparison
will be weakened according to any possible flaws of the
Reference Method.
In conclusion, we find the visual inspection of plots to be
essential for method comparison. When a field method is
compared with a Reference Method, the hypothesis of
identity within analytical imprecision or within stated
analytical quality specifications should be applied. Both
x–y plots and difference plots are useful, but we find the
difference plot is easier to handle and interpret and
facilitates the calculations of uncertainty.
References
1. Westgard JO, de Vos DJ, Hunt M, Quam EG, Carey RN, Garber CC.
Method evaluation. American Society for Medical Technology,
Houston, 1978.
2. Westgard JO, Hunt MR. Use and interpretation of common statistical tests in method-comparison studies. Clin Chem 1973;19:
49 –57.
3. Westgard JO, Carey RN, Wold S. Criteria for judging precision and
accuracy in method development and evaluation. Clin Chem
1974;20:825–33.
4. Bland JM, Altman DG. Statistical methods for assessing agree-
2046
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
Hyltoft Petersen et al.: Interpretation of analytical data by use of difference plots
ment between two methods of clinical measurement. Lancet
1986;i:307–10.
Hollis S. Analysis of method comparison studies [Guest Editorial].
Ann Clin Biochem 1996;33:1– 4.
Pollock MA, Jefferson SG, Kane JW, Lomax K, MacKinnin G,
Winnard CB. Method comparison—a different approach. Ann Clin
Biochem 1992;29:556 – 60.
Svendsen A, Holmskov U, Hyltoft Petersen P, Jensenius JC.
Difference and ratio plots: simple tools for improved presentation
and interpretation of data. Unexpected possibilities for the use of
conglutinin binding assay in inflammatory rheumatic diseases.
J Immunol Methods 1995;178:211– 8.
Hyltoft Petersen P, Felding P, Hørder M, Tryding N. Effect of
posture on concentrations of serum proteins in healthy adults.
Dependency of the molecular size of proteins. Scand J Clin Lab
Invest 1980;40:623– 8.
Hyltoft Petersen P, Blaabjerg O, Irjala K, Icén A, Bjøro K. The Nordic
Protein Project. In: Hyltoft Petersen P, Blaabjerg O, Irjala K, eds.
Assessing quality in measurements of plasma proteins. The
Nordic Protein Project and related projects. Helsinki, Finland:
NORDKEM (Nordic Clinical Chemistry Project), 1994:87–116.
Stöckl D. Beyond the myths of difference plots [Letter]. Ann Clin
Biochem 1996;33:575–7.
NCCLS. Method comparison and bias estimation using patient
samples; approved guidelines. Document EP9-A (ISBN 1–56238283–7). Wayne, PA: NCCLS, 1995.
Houbouyan L, Boutière B, Contant G, Dautzenberg MD, Fievet P,
Potron G, et al. Validation protocol of analytical hemostasis
systems: measurement of anti-Xa activity of low-molecular-weight
heparins. Clin Chem 1996;42:1223–30.
Dybkær R. Vocabulary for describing the metrological quality of a
measurement procedure. Upsala J Med Sci 1993;98:445– 86.
International Organization of Standardization. Terms and definitions used in connection with reference materials. ISO Guide 30,
2nd ed. Geneva: ISO, 1992.
Bliss CI. Statistics in biology. New York: McGraw-Hill, 1967:558
pp.
Documenta Geigy. Mathematics and statistics. Basle, Switzerland: Ciba-Geigy, 1975.
Hørder M, ed. Assessing quality requirements in clinical chemistry. Scand J Clin Lab Invest 1980;40(Suppl 155):144pp.
de Verdier C-H, ed. Medical need for quality specifications in
laboratory medicine. Upsala J Med Sci 1990;95:161–309.
Gowans EMS, Hyltoft Petersen P, Blaabjerg O, Hørder M. Analytical goals for the acceptance of common reference intervals for
laboratories throughout a geographical area. Scand J Clin Lab
Invest 1988;48:757– 64.
Hyltoft Petersen P, Gowans EMS, Blaabjerg O, Hørder M. Analytical goals for the estimation of non-Gaussian reference intervals.
Scand J Clin Lab Invest 1989;49:727–37.
Cotlove E, Harris EK, Williams GZ. Biological and analytical
components of variation in long-term studies of serum constitu-
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
ents in normal subjects. III. Physiological and medical implications. Clin Chem 1970;16:1028 –32.
Elevitch R, ed. College of American Pathologists Conference
Report. Conference on analytical goals in clinical chemistry, at
Aspen, Colorado. Skokie, IL: CAP, 1976:129pp.
Fraser CG, Hyltoft Petersen P, Ricos C, Haeckel R. Proposed
quality specifications for the imprecision and inaccuracy of analytical systems for clinical chemistry. Eur J Clin Chem Clin Biochem 1992;30:311–7.
Fraser CG, Hyltoft Petersen P, Ricos C, Haeckel R. Quality
specifications. In: Haeckel R, ed. Evaluation methods in laboratory medicine. Weinheim: VCH, 1992:87–99.
Stöckl D, Baadenhuijsen H, Fraser CG, Libeer J-C, Hyltoft Petersen
P, Ricós C. Desirable routine analytical goals for quantities
assayed in serum. Discussion paper from the members of the
external quality assessment (EQA) working group A on analytical
goals in laboratory medicine. Eur J Clin Chem Clin Biochem
1995;33:157– 69.
Fraser CG, Hyltoft Petersen P. Quality goals in external quality
assessment are best based on biology. Scand J Clin Lab Invest
1993;53(Suppl 212):8 –9.
Stöckl D, Reinauer H. Development of criteria for the evaluation of
reference method values. Scand J Clin Lab Invest 1993;53(Suppl
212):16 – 8.
Thienpont LM, van Landuyt KG, Stöckl D, de Leenheer AP.
Candidate reference method for determining serum creatinine by
isocratic HPLC: validation with isotope dilution gas chromatography–mass spectrometry and application for accuracy assessment of routine test kits. Clin Chem 1995;41:995–1003.
Krouwer JS, Monti KL. A simple, graphical method to evaluate
laboratory assays. Eur J Clin Chem Clin Biochem 1995;33:525–7.
Fuentes-Arderiu X, Fraser CG. Analytical goals for interference.
Ann Clin Biochem 1991;28:383–5.
Whicher JT, Ritchie RF, Johnson AM, Baudner S, Bienvenu J,
Blirup-Jensen S, et al. New international reference preparation for
proteins in human serum (RPPHS). Clin Chem 1994;40:934 – 8.
Blirup-Jensen S, Just Svendsen P. A new reference preparation for
proteins in human serum. Upsala J Med Sci 1994;99:251– 8.
Flensted Lassen J, Kjeldsen J, Antonsen S, Hyltoft Petersen P,
Brandslund I. Interpretation of serial measurements of International Normalized Ratio (INR) in monitoring oral anticoagulant
treatment. A graphical combination of therapeutic interval and
reference change. Clin Chem 1995;41:1171– 6.
US Department of Health and Human Services. Medicare, Medicaid, and CLIA programs; regulations implementing the Clinical
Laboratory Improvement Amendments of 1988 (CLIA). Final rule.
Fed Regist 1992;57:7002–186.
Ehrmeyer SS, Laessig RH, Leinweber JE, Oryall JJ. 1990 Medicare/CLIA final rules for proficiency testing: minimum intralaboratory performance characteristics (CV and bias) needed to pass.
Clin Chem 1990:36:1736 – 40.