Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Clinical Chemistry 43:11 2039 –2046 (1997) Opinion Graphical interpretation of analytical data from comparison of a field method with a Reference Method by use of difference plots Per Hyltoft Petersen,1* Dietmar Stöckl,2 Ole Blaabjerg,1 Bent Pedersen,1 Erling Birkemose,1 Linda Thienpont,2 Jens Flensted Lassen,3 and Jens Kjeldsen4 Various viewpoints have been offered regarding the appropriate use of scatter plots or difference plots (bias plots or residual plots) in comparing analytical methods. In many of these discussions it seems the basic concepts of identity (within inherent imprecision) and acceptability based on analytical goals (analytical quality specifications) have been forgotten. With the increasing number of Reference Methods in laboratory medicine, these basic concepts are becoming more important in validation of field methods. Here we describe a simple and effective graphical test of these hypotheses (identity and acceptability) by use of difference plots. These plots display the underlying hypothesis before the measured differences are plotted and allow interpretation of the results according to specific criteria. We further describe simple but effective interpretations of the data, when the hypothesis is not fulfilled, by using two data sets drawn from comparisons of field methods for S-creatinine with a Reference Method for this analyte. The difference plot is a graphical tool with related simple statistics for comparison of a field method with a Reference Method, focusing on (a) identity within the inherent analytical imprecision or (b) acceptability within analytical quality specifications. Calculation of the standard deviation of the differences is an indispensable tool for evaluation of aberrant-sample bias (matrix effects). Departments of 1 Clinical Chemistry and 4 Medical Gastroenterology, Odense University Hospital, DK-5000 Odense C, Denmark. 2 Laboratorium voor Analytische Chemie, Faculteit Farmaceutische Wetenschappen, Universiteit Gent, Harelbekestraat 72, B-9000 Gent, Belgium. 3 Department of Clinical Chemistry, Vejle Sygehus, DK-7100 Vejle, Denmark. *Author for correspondence. Fax 1 45 65 41 19 11; e-mail phy@imbmed. ou.dk. Received December 10, 1996; revision accepted June 17, 1997. Westgard et al. [1–3] outlined the basic principles for method comparison in a clear, easy to follow manual. They also introduced the concept of allowable analytical error and gave an overview of published performance criteria. They recommended that the estimated analytical imprecision and bias be compared with these performance criteria in method evaluation as well as in method comparison. Their approach made use of a scatter-plot and calculations based on regression lines, but with confidence limits and judgment of acceptability based on the criteria for allowable analytical error. These principles of comparing analytical performance with performance criteria, however, have not been universally accepted, and recent publications have criticized the misuse of correlation coefficients [4] and overinterpretation of regression lines in method comparison [5–7]. Bland and Altman [4] recommended the difference plot (or bias plot or residual plot) as an alternative approach for method comparison. On the abscissa they used the mean value of the methods to be compared, to avoid regression towards the mean, and on the ordinate they plotted the calculated difference between measurements by the two methods. They further estimated the mean and standard deviation of differences and displayed horizontal lines for the mean and for 62 3 the standard deviation. However, they missed the concept of a more objective criterion for acceptability. Recently, Hollis [5] has recommended difference plots as the only acceptable method for method comparison studies for publication in Annals of Clinical Biochemistry, but without specifying criteria for acceptability. However, a few difference plots with evaluation of acceptability according to defined criteria have been published, e.g., in evaluation of estimated biological variation compared with analytical imprecision [8], and in external quality assessment of plasma proteins for the possibilities of sharing common reference intervals [9]. Maybe the scarcity of such publications is more a 2039 2040 Hyltoft Petersen et al.: Interpretation of analytical data by use of difference plots question of interpretation of the data by plotting than a strict choice between scatter-plot and difference plot, as discussed by Stöckl [10] recently. Investigators seem to rely too much on regression lines and r-values, without doing the equally important interpretation of the data points of the plot. This is becoming more and more disadvantageous with the increasing number of Reference Methods available for comparison with field methods, because in these cases, it is not a question of finding some relationships, but simply of judging the field method to be acceptable or not. NCCLS has recently published guidelines for method comparison and bias estimation by using patients’ samples [11], where both scatter-plots and bias plots are advised. The document also recommends plotting of single determinations as mean values and stresses the need of visual inspection of data. Further, comparison with performance criteria is recommended, but these criteria are not specified and they are not used in the graphical interpretation. Recently, Houbouyan et al. [12] used ratio plots in their validation protocol of analytical hemostasis systems, where they used a preset, but arbitrarily chosen, acceptance limit of inaccuracy of 15%. In the following, we will use the difference plot (or bias plot) in combination with simple statistics for the principal judgment of the identity or acceptability of a field method. The difference plot makes it easier to apply the concept; in principle, however, the same evaluations could be performed for a scatter-plot in relation to the line of identity (y 5 x). The aim of this contribution is to pay attention to the hypothesis of identity and the concept of acceptable analytical quality in method comparison, especially when one of the methods is a Reference Method. basic considerations The basis for comparing an analytical field method with an analytical Reference Method measuring the same quantity (analyte) is the hypothesis of identity within inherent imprecision or within preset analytical quality specifications. The field method, whether a kit or a self-produced analytical procedure, is applied for routine analyses in laboratory medicine. The method must be demonstrated to have adequate accuracy (trueness), precision, and specificity (lack of aberrant-sample bias [13]) for its intended use. The Reference Method, in contrast, is a “thoroughly investigated method, clearly and exactly describing the necessary conditions for procedures, for the measurement of one or more property values that has been shown to have accuracy and precision commensurate with its intended use and that can therefore be used to assess the accuracy of other methods for the same measurement, particularly in permitting the characterization of a reference material” [14]. Further characteristics of the methods are the costs, the complexity, the equipment, the time used for production of the results, and so forth—the field method generally being cheap and well suited for routine work and the Reference Method usually being applicable only in few competent and specially equipped laboratories. The two approaches to the comparison are: 1. Identity. The results from the field method do not deviate from the Reference Method results by more than the inherent imprecision of both methods. 2. Analytical quality specifications. The results from the field method do not deviate from the Reference Method results by more than the acceptance specifications do from the analytical goals. In both cases the null hypothesis is that the measured differences for all samples are zero. In the first case, the acceptance limits are defined by the inherent analytical imprecisions, and in the latter case the acceptance limits are defined by the analytical quality specifications. By assuming the ideal situation, the theoretical limits for testing the null hypothesis can be drawn in a difference plot, i.e., a mean difference [mean(d)] equal to 0, and and a standard deviation of differences [s(d)] calculated from the imprecisions of the two methods. The ultimate hypothesis, then, is that the points are distributed within these bounds. If they fall outside of these bounds, the hypothesis is rejected. Alternatively, the performance characteristics are tested against defined analytical quality specifications. acceptance limits defined by inherent analytical imprecision Constant analytical standard deviations are presumed. A number of patients’ samples are used for the comparison of the field method with the Reference Method. The result of the ith patient sample by the two methods is xiF and xiR for field and Reference Method, respectively, and the difference, xid, is xiF 2 xiR. Further, the theoretical variance of the differences is the sum of the two variances denoting the inherent imprecision of field and Reference Methods: s2(d) 5 s2F 1 s2R (the theoretical s values for the two methods being estimated independently for the two methods of the comparison, or estimated from replicate measurements during the comparison). When means of duplicates are used, then s should be divided by =2. When the two methods are identical, the expectation is that ;68% of differences will be distributed symmetrically around 0 within 0 6 1s(d), and 95% will be within 0 6 1.96s(d). If the distribution of differences fits these criteria, then it is not possible to find any difference between results from the two methods within the inherent imprecision. If this is not the case, then the methods are not identical. Before the measured points are plotted on any figure, the hypothesis can be illustrated in a difference plot, as shown in Fig. 1A for the example of S-creatinine. The horizontal line y 5 0 illustrates the hypothesis of xid 5 0; the other lines indicate within 0 6 1s(d) for 68% and within 0 6 1.96s(d) for 95% of the differences, respectively. The outer lines indicate the 95% prediction interval Clinical Chemistry 43, No. 11, 1997 2041 Fig. 1. Acceptance limits defined by inherent analytical imprecision: difference plots for S-creatinine, with Reference Method (REF) values as abscissa and the difference between field method (FIELD) and REF as ordinate, assuming constant s. (A) Expected distribution of 95% of the measured points, 0 6 2s(d) 5 66.3 mmol/L (95% prediction interval); the lines for the expected distribution of 68% of the points are also indicated, 0 6 1s(d). (B) Same as A but with addition of the simulated “measured” points (simulated from a gaussian distribution with mean of 20.5 and s 5 3.0 mmol/L). (C) Same as B but with indication of the statistical 95% tolerance interval with 95% confidence, 0 6 2.69s(d). (D) Same as B and with calculated lines according to Bland and Altman [2], indicating the calculated mean(d) 6 2s(d), mean(d) 5 20.84, and s(d) 5 3.27 mmol/L. for the expected distribution of points. In this example, sF 5 3.10 and sR 5 0.50 and, therefore, s(d) 5 3.15 mmol/L. It is educational to describe the hypothesis before plotting the points, because most investigators will start interpretation of the points in the form of functional relationships and thereby forget about the hypothesis. With the hypothesis in mind, one can now plot the data points as shown in Fig. 1B. The data points are generated for Reference Method values between 50 and 150 mmol/L and computer-simulated gaussian-distributed differences based on mean(d) 5 20.5 mmol/L and s(d) 5 3.00 mmol/L; from these simulated data the calculated mean(d) was 20.84 mmol/L and s(d) was 3.27 mmol/L. The data points are distributed roughly according to the hypothesis, with 16 points (70%) within the 0 6 1s(d) and 21 points (91%) within 0 6 2s(d), leaving 2 points (9%) outside the limits; because these two points seem not to deviate too much from the general distribution, the conclusion could be that the finding of just 2 points outside (and close to) the 95% prediction interval is expected and acceptable and, therefore, that the field method is indistinguishable from the Reference Method within the analytical imprecision, so the evaluation can stop. Statistically, the mean difference can be evaluated by a t-test and the distribution of differences by an F-test. This approach is very narrow, and some uncertainty related to unknown factors may be taken into account. Such variations could be the variation between the two tubes/vials with serum from the same individual or the underestimation of sF. Therefore, possible additional sources of uncertainty always should be taken into account when appropriate. However, the design of a method comparison has to be carefully planned so as to exclude additional uncertainties. Further, any addition of “acceptable” uncertainty should be well thought through and handled with caution. For the experienced scientist, interpreting the difference plot is easy. A more objective criterion, however, for graphical validation of the distribution of points, and especially the more extreme points, is to apply the concept of tolerance intervals, where the standard deviate (z or c) is substituted for by a tolerance factor, k, with a value dependent on the percentage of points (here 95%) and the confidence with which this percentage should be obtained. The k-value is determined by the assumptions about the new distribution, whether the mean or the standard deviation is unknown, or both. The k-value further depends on the number of points, n [15, 16]. Although we have a hypothesis about mean difference and standard deviation, these figures are unknown in 2042 Hyltoft Petersen et al.: Interpretation of analytical data by use of difference plots practice, so the k7 for unknown mean and standard deviation may be the most relevant to use. For n 5 23 and 95% confidence for 95% of the points, the tolerance factor, k7, is 2.67 and the tolerance interval is 0 6 2.67s. This is illustrated in Fig. 1C, where all points are distributed within the chosen tolerance interval. The present approach of theoretical expected distribution is compared with the approach of Bland and Altman [4] in Fig. 1D by inserting the new lines determined by mean(d) 6 2 s(d) estimated from the data points. The difference between the present concept and the Bland and Altman concept is clear from Fig. 1D. Accordingly, we illustrate the 95% prediction interval before any data points are applied, in contrast to Bland and Altman, who simply illustrate the statistics of the points. In this example the mean values are clearly different, whereas the standard deviations are rather close to each other. To illustrate the relations to x–y plots, we first add to Fig. 1B 11 points (triangles), as shown in Fig. 2(top). The specimens producing these points are assumed to contain some “nonspecific” components [in the S-creatinine ex- Fig. 2. Comparison of difference plot with x–y plot for two sets of identifiable data points: (F) same as in Fig. 1 and (Œ) a new set with mean difference 5 4.7 and standard deviation of differences 5 2.8 mmol/L. (Top) Difference plot (as in Fig. 1B); (bottom) x–y plot showing the line of identity (y 5 x). ample, perhaps the specimens are from diabetics, where (e.g.) glucose could result in nonspecific reactions by the Jaffe methods]. In the difference plot they separate clearly from the other points and the difference and standard deviation change to 1 0.16 and 4.08 mmol/L, respectively. In the x–y plot (Fig. 2, bottom), where the points should be related to the line of identity (y 5 x), it is difficult to see the difference between the two sets of points. If we turn to calculation of r-values, r decreases from 0.993 to 0.991, the slope of the regression line changes from 1.020 to 1.025, and the intercept changes from 22.54 to 21.17 mmol/L. The information from difference plots and x–y plots (when using the line of identity) is the same, but it is easier to expand the differences in the difference plot and the calculations of variances are simpler, whereas the 45° angle in the x–y plot makes comparable calculations more difficult. This is also seen from Westgard et al. [2, 3], where the simple calculations of (e.g.) bias is easier to interpret in combination with figures. acceptance limits defined by goals for analytical quality A more relevant approach for comparing a field method with a Reference Method is to use the analytical goals (analytical quality specifications) as acceptance limits. These specifications may be related to the clinical use of laboratory data [17, 18] or more generally to the application of common reference intervals [19, 20] and monitoring patients [21, 22]. Two European groups, one under the auspices of EGE-Lab (European Group for the Evaluation of Reagents and Analytical Systems in Laboratory Medicine) [23, 24], and another group under the auspices of European EQA-Organizers (External Quality Assessment Organizers) [25], have given recommendations for analytical quality specifications based on the same biological concepts and with identical criteria for acceptable analytical bias and imprecision, but with a different concept for combining these—the first (EGE-Lab) defining the recommendations for analytical bias and imprecision separately and the latter (EQA-Organizers) combining these two aspects. In both European recommendations, the analytical quality specification for imprecision is CVanalytical #0.5 CVwithin-subject variation as proposed by Cotlove et al. [21], and that for analytical bias is uBanalytical u #0.25 CVtotal biological variation [19]. The EGE-Lab concept accepts both a maximum bias and a maximum imprecision simultaneously; the EQA-Organizer concept describes a functional relationship between the two in the form of a maximum allowable combination of imprecision and bias. The latter is close to the original concept of Gowans et al. [19], defining the acceptable analytical percentage bias for S-creatinine as 2.8% [25] when imprecision is negligible. According to the EGE-Lab concept [23, 24], however, both a bias of 2.8% and an imprecision of 2.2% are acceptable simultaneously. This means that for single determinations 0 6 ( bias 1 1.65 3 imprecision) is acceptable [26]; i.e., 95% of the single points must lie Clinical Chemistry 43, No. 11, 1997 within the limits of 0 6 (2.8% 1 1.65 3 2.2%) 5 0 6 6.4%, as illustrated in Fig. 3 (left). This criterion is fulfilled as shown in Fig. 3 (middle). The concept, however, is onesided, i.e., is valid only in one direction. Therefore, the standard deviation of differences should be judged against the imprecision criterion separately. The standard deviation of the field method is 3.1 mmol/L and the CV 5 3.9%, which exceeds the imprecision specification of 2.2%. Figure 3 (right) illustrates an example of acceptable analytical imprecision and bias. The CV is 2.1% and the estimated bias (mean difference) is 11.3 mmol/L (95% confidence interval, 0.6 –2.0 mmol/L). Note that the mean difference is different from 0 but is acceptable according to the criterion. For the purpose of method comparison, the value for maximum allowable bias might be expanded because of the uncertainty (confidence interval) of the Reference Method. This cannot be seen from the actual comparison, but because the Reference Method is allowed to have some uncertainty, then this must be allowed also for the field method. We propose using the factor 1.2, in light of a recent concept that requires for Reference Methods a total error of ,0.2 times that of the routine method [27]. In the present case, the acceptance limits for bias should thus be 0 6 3.36%, as also used in the example below. two examples utilizing comparison data The data used are from a paper on a candidate reference method for determining S-creatinine, which was used for comparison of four field methods [28]. Data from two of these comparisons are used, but only for concentration values ,150 mmol/L, the only region in which there are sufficient data for our presentation. Practical example 1. Based on the duplicate analyses performed, analytical imprecision is calculated within the interval 50 –150 mmol/L for both the Reference Method (HPLC) and the field method, giving sR 5 0.568 mmol/L and sF 5 0.791 mmol/L. When means of duplicates are used for the comparison, the calculated theoretical s(d) should be divided by =2, giving an expected distribution 2043 of 95% (the 95% prediction interval) of the points within 0 6 2s(d) 5 0 6 1.4 mmol/L. Further, the expanded allowable bias of 3.36% is used for the analytical quality specifications according to both EGE-Lab and EQA-Organizers concepts. The graphical evaluations are shown in Fig. 4(top). Here, the calculated distribution is considerably broader than the one assumed from the analytical point of view, whereas the measured mean bias is within the limits of acceptance of the bias because the mean (21.6) and 95% confidence interval (20.5 to 22.7 mmol/L) are within the 3.36% allowable bias. Further, the points are distributed within the total EGE-Lab criteria with only one real outlier. The difference between the present concept of 95% prediction interval and the Bland and Altman description of the actual points is clear from the Figure. At first glance, the field method should be acceptable, with CV 5 1% and bias ,3.36%, but the distribution of points (Fig. 4, top) reveals a much broader distribution (CV 5 5%), which emphasizes an unknown uncertainty. The problem is further underlined by the adding of single determinations to the measurements in the Figure, illustrating the reproducibility of the individual differences. This uncertainty is close to 5% and may originate not only from vial-to-vial variation but also from aberrant-sample bias, whether from nonspecific reactions or interference in the field method. Thus the difference plot and calculation of the standard deviation of differences are tools to disclose aberrant-sample bias in field methods. Fuentes-Arderiu and Fraser [30] have proposed that the combined effects of imprecision and interference should be used in the concept of specifications for imprecision, but the problem has not been dealt with in either the EGE-Lab or EQA-Organizer concepts. Practical example 2. In the other example (Fig. 4, bottom), all points are displaced from the acceptance area, showing a considerable mean bias (;20 mmol/L) and also considerable uncertainty from aberrant-sample bias. The calcu- Fig. 3. Acceptance limits defined by goals for analytical quality specifications: difference plot. (Left) Acceptance limits for maximum allowable analytical bias of 0 6 2.8% are shown according to the EQA-Organizers concept (see text). Lines for maximum combined distribution according to the EGE-Lab concept, 6(bias 1 1.65 CV) 5 66.4%, are also shown (see text). (Middle) Same as left panel but with the simulated difference points (same as in Fig. 1). The mean value is indicated with 95% confidence interval (f). (Right) Another set of simulated data fulfilling the EGE-Lab criteria (see text). 2044 Hyltoft Petersen et al.: Interpretation of analytical data by use of difference plots Fig. 4. Examples from data of comparison of field methods with a Reference Method (HPLC): difference plots (note differences in scales of ordinates). Practical example 1: The different acceptance limits are shown, according to inherent analytical imprecision, 0 6 2s (0 6 1.4) mmol/L; acceptable bias, but expanded to 63.36% (see text); and the lines for maximum combined distribution according to the EGE-Lab concept, 6(bias 1 1.65 CV) 5 67.0% (see text). Further, the measured points are applied together with the calculated distribution: mean(d) 6 2s(d), mean(d) 5 21.6 mmol/L, s(d) 5 3.95 mmol/L. The single measured points from the duplicates of the field method (x) are applied. Practical example 2: Acceptance limits for the EGE-Lab concept, as well as the calculated distribution, mean(d) 6 2s(d). The upper limit according to the CLIA criteria is also shown (see text). lated imprecision, based on assays of duplicates, was ;1%, but the difference plot reveals the errors clearly. The CLIA criteria are much wider than the European recommendations, but as total error specifications are easy to apply in the plot. For concentrations ,175 mmol/L the acceptable deviation is 626.5 mmol/L (60.3 mg/L), 615% at higher concentrations. The upper line of the CLIA criterion is shown in Fig. 4 (bottom). Because 7 of the 54 points are outside the line, the method is also unacceptable from the standpoint of proficiency testing. often been mixed, as pointed out recently by Stöckl [10]. Stöckl emphasizes that graphical presentation of data (x–y plot or difference plot) should not be mixed with statistical interpretation of data. As long as the purpose is to find the best functional relationship between two methods in order to correct one with the other, then an x–y plot and calculation of the regression line with an estimation of the scatter via syux may be most relevant, with visual inspection of the scatter and a residual plot. The present task, however, is not to find a functional relationship between two methods, but to judge a field method in relation to a Reference Method. When a field method is compared with a Reference Method for acceptability according to certain criteria, whether analytical or biological, then the visual inspection of all data is essential. Whether one uses a simple x–y plot or a difference plot is not critical, as long as the area of interest is expanded and the single points are assessed according to the hypothesis of identity between the two methods. The hypothesis is that the measured values are identical (and not that they are unrelated, which is the basic hypothesis for correlation studies), which means that the hypothesis is described in an x–y plot as the line of identity (y 5 x) and in the difference plot as the line y 5 0. When a ratio plot is more appropriate than a difference plot, e.g., when analytical CV is close to being constant, the same evaluations can be performed. The advantages of difference plots (and ratio plots) are keeping the hypothesis of identity in mind and the ease of expanding the difference ordinate according to the investigator’s purpose. The power of the graphical illustration in Figs. 1A and 3 (left) lies in the simplicity and the clear definition of the hypotheses. For the experienced interpreter, most situations can be evaluated by visual inspection of the plots, whether a difference plot or an x–y plot. If more objective criteria are wanted, calculations of mean difference with confidence intervals are a powerful tool, as is a table of k7-values5 for estimation of tolerance intervals. Most important, however, is the inspection of the distribution of the difference points, especially when samples expected to have matrix effects are marked, and calculation of the standard deviation of differences. When this exceeds the estimated analytical imprecision, it is an indication of aberrant-sample bias. In principle, the rvalue from correlation between x and y reflects this, but in practice the r-value is insensitive, as illustrated in Fig. 2. Further, calculation of the standard deviation of differences gives an estimate of aberrant-sample bias compared with the theoretical imprecision. The same information can be obtained from the syux estimate from regression analysis. general discussion In the current discussion of difference plot vs x–y plots and the application of regression lines or functions, evaluations of data (vis-à-vis best relationship and comparison of a field method with a Reference Method) have 5 Some k7-values for 95% tolerance factors for the 95% interval: 3.38 for n 5 10; 2.75 for n 5 20; 2.55 for n 5 30; 2.38 for n 5 50; 2.30 for n 5 70; and 2.23 for n 5 100. 2045 Clinical Chemistry 43, No. 11, 1997 The plotting of single determinations can give information about imprecision and aberrant-sample bias as well and, if needed, a functional curve can be calculated and drawn. In this context we mention that replicate measurements always should be performed in this type of comparison, and that specimens should be stored for evaluation of possible outliers. Krouwer and Monti [29] presented a graphical method for evaluation of laboratory assays (a mountain plot). They computed the percentile for each ranked difference between the two methods, and by “turning” at the 50th percentile produced a histogram-like function (the mountain). This method is relevant for detecting large infrequent errors (differences) but lacks the aspect of concentration relationship. These investigators, therefore, recommend use of their plot together with difference plots. Introduction of analytical quality specifications in the mountain plots may be useful in method evaluations. Bland and Altman have pointed to the presentation of difference plots, where they recommend mean values of both methods on the abscissa [4]; however, the risk of regression towards the mean is negligible in studies where field methods are compared with Reference Methods, because the results from the Reference Method (used on the abscissa) are assumed to have negligible error. They further calculate and present the standard deviation of the measured data, which is relevant information but not related to the hypothesis of identity. This form of graphical testing of the hypothesis of identity has been used for biological data [8], where the standard deviation of measured differences was compared with the analytical imprecision. A more stringent method of testing measured values from field methods against target values has been applied for plasma proteins in The Nordic Protein Project [9]. Here, the target values of serum pools were assigned from the European Community Bureau of Certified Reference Material, CRM 470 [31], by the method recommended by IFCC [32], and were used for the abscissa in a plot with acceptance lines according to acceptable bias [9] and with the measurements illustrated by the mean difference with a 90% confidence interval. Another graphical method of prediction of single measurements of clinical data has been published for serial measurements of International Normalized Ratios of prothrombin times [33]. Here, the differences between consecutive measurements in patients were compared with the expected variation estimated from patients under steady-state conditions. The abscissa was used for the latest result, resulting in a regression towards the mean, which was considered acceptable for the purpose (i.e., not to investigate correlations). The usefulness of the nomogram was improved by adding vertical lines for the therapeutic interval. The goals used for acceptance are relevant for the use of common reference intervals [19, 20] and have been recommended by two European groups [23–25], with different consequences for the acceptance. This problem is related to the phenomenon of aberrant-sample bias (matrix effects) and has not been fully clarified by the proposed goal from either EGE-Lab or EQA-Organizers; Fuentes-Arderiu and Fraser [30], however, have proposed that the aberrant-sample bias be included in the precision goal [30]. Other analytical goals have been postulated [17, 18, 34] and may be relevant for other evaluations. Goals based on biological data, however, are general in nature (related to common reference intervals and monitoring of patients) and are not restricted to a specific clinical application. The CLIA criteria for S-creatinine are 60.3 mg/dL (3 g/L, or 26.5 mmol/L) or 615% for concentrations .175 mmol/L [28]. These are criteria for total error, but they are very wide compared with the European recommendations. The CLIA criteria are intended for use with proficiency testing (and therefore need to be wide), whereas the European recommendations are so-called educational criteria, which relate directly to the desirable performance criteria for optimum monitoring of patients and the sharing of common reference intervals within geographical areas with populations that are homogeneous for the quantity of analyte. The criteria proposed by Ehrmeyer et al. [35] for minimum intralaboratory performance characteristics to pass CLIA—CV ,33% and bias ,20%, proficiency testing criteria— can be applied as well as the European criteria, but for the latter the total error will be determining. The CLIA criteria may be applied even better in difference plots, given the total error concept. The validity of the analytical conclusions of this type of evaluation relies on the Reference Method chosen. It must be correct and specific and so forth, with negligible imprecision; otherwise, the conclusions of the comparison will be weakened according to any possible flaws of the Reference Method. In conclusion, we find the visual inspection of plots to be essential for method comparison. When a field method is compared with a Reference Method, the hypothesis of identity within analytical imprecision or within stated analytical quality specifications should be applied. Both x–y plots and difference plots are useful, but we find the difference plot is easier to handle and interpret and facilitates the calculations of uncertainty. References 1. Westgard JO, de Vos DJ, Hunt M, Quam EG, Carey RN, Garber CC. Method evaluation. American Society for Medical Technology, Houston, 1978. 2. Westgard JO, Hunt MR. Use and interpretation of common statistical tests in method-comparison studies. Clin Chem 1973;19: 49 –57. 3. Westgard JO, Carey RN, Wold S. Criteria for judging precision and accuracy in method development and evaluation. Clin Chem 1974;20:825–33. 4. Bland JM, Altman DG. Statistical methods for assessing agree- 2046 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. Hyltoft Petersen et al.: Interpretation of analytical data by use of difference plots ment between two methods of clinical measurement. Lancet 1986;i:307–10. Hollis S. Analysis of method comparison studies [Guest Editorial]. Ann Clin Biochem 1996;33:1– 4. Pollock MA, Jefferson SG, Kane JW, Lomax K, MacKinnin G, Winnard CB. Method comparison—a different approach. Ann Clin Biochem 1992;29:556 – 60. Svendsen A, Holmskov U, Hyltoft Petersen P, Jensenius JC. Difference and ratio plots: simple tools for improved presentation and interpretation of data. Unexpected possibilities for the use of conglutinin binding assay in inflammatory rheumatic diseases. J Immunol Methods 1995;178:211– 8. Hyltoft Petersen P, Felding P, Hørder M, Tryding N. Effect of posture on concentrations of serum proteins in healthy adults. Dependency of the molecular size of proteins. Scand J Clin Lab Invest 1980;40:623– 8. Hyltoft Petersen P, Blaabjerg O, Irjala K, Icén A, Bjøro K. The Nordic Protein Project. In: Hyltoft Petersen P, Blaabjerg O, Irjala K, eds. Assessing quality in measurements of plasma proteins. The Nordic Protein Project and related projects. Helsinki, Finland: NORDKEM (Nordic Clinical Chemistry Project), 1994:87–116. Stöckl D. Beyond the myths of difference plots [Letter]. Ann Clin Biochem 1996;33:575–7. NCCLS. Method comparison and bias estimation using patient samples; approved guidelines. Document EP9-A (ISBN 1–56238283–7). Wayne, PA: NCCLS, 1995. Houbouyan L, Boutière B, Contant G, Dautzenberg MD, Fievet P, Potron G, et al. Validation protocol of analytical hemostasis systems: measurement of anti-Xa activity of low-molecular-weight heparins. Clin Chem 1996;42:1223–30. Dybkær R. Vocabulary for describing the metrological quality of a measurement procedure. Upsala J Med Sci 1993;98:445– 86. International Organization of Standardization. Terms and definitions used in connection with reference materials. ISO Guide 30, 2nd ed. Geneva: ISO, 1992. Bliss CI. Statistics in biology. New York: McGraw-Hill, 1967:558 pp. Documenta Geigy. Mathematics and statistics. Basle, Switzerland: Ciba-Geigy, 1975. Hørder M, ed. Assessing quality requirements in clinical chemistry. Scand J Clin Lab Invest 1980;40(Suppl 155):144pp. de Verdier C-H, ed. Medical need for quality specifications in laboratory medicine. Upsala J Med Sci 1990;95:161–309. Gowans EMS, Hyltoft Petersen P, Blaabjerg O, Hørder M. Analytical goals for the acceptance of common reference intervals for laboratories throughout a geographical area. Scand J Clin Lab Invest 1988;48:757– 64. Hyltoft Petersen P, Gowans EMS, Blaabjerg O, Hørder M. Analytical goals for the estimation of non-Gaussian reference intervals. Scand J Clin Lab Invest 1989;49:727–37. Cotlove E, Harris EK, Williams GZ. Biological and analytical components of variation in long-term studies of serum constitu- 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. ents in normal subjects. III. Physiological and medical implications. Clin Chem 1970;16:1028 –32. Elevitch R, ed. College of American Pathologists Conference Report. Conference on analytical goals in clinical chemistry, at Aspen, Colorado. Skokie, IL: CAP, 1976:129pp. Fraser CG, Hyltoft Petersen P, Ricos C, Haeckel R. Proposed quality specifications for the imprecision and inaccuracy of analytical systems for clinical chemistry. Eur J Clin Chem Clin Biochem 1992;30:311–7. Fraser CG, Hyltoft Petersen P, Ricos C, Haeckel R. Quality specifications. In: Haeckel R, ed. Evaluation methods in laboratory medicine. Weinheim: VCH, 1992:87–99. Stöckl D, Baadenhuijsen H, Fraser CG, Libeer J-C, Hyltoft Petersen P, Ricós C. Desirable routine analytical goals for quantities assayed in serum. Discussion paper from the members of the external quality assessment (EQA) working group A on analytical goals in laboratory medicine. Eur J Clin Chem Clin Biochem 1995;33:157– 69. Fraser CG, Hyltoft Petersen P. Quality goals in external quality assessment are best based on biology. Scand J Clin Lab Invest 1993;53(Suppl 212):8 –9. Stöckl D, Reinauer H. Development of criteria for the evaluation of reference method values. Scand J Clin Lab Invest 1993;53(Suppl 212):16 – 8. Thienpont LM, van Landuyt KG, Stöckl D, de Leenheer AP. Candidate reference method for determining serum creatinine by isocratic HPLC: validation with isotope dilution gas chromatography–mass spectrometry and application for accuracy assessment of routine test kits. Clin Chem 1995;41:995–1003. Krouwer JS, Monti KL. A simple, graphical method to evaluate laboratory assays. Eur J Clin Chem Clin Biochem 1995;33:525–7. Fuentes-Arderiu X, Fraser CG. Analytical goals for interference. Ann Clin Biochem 1991;28:383–5. Whicher JT, Ritchie RF, Johnson AM, Baudner S, Bienvenu J, Blirup-Jensen S, et al. New international reference preparation for proteins in human serum (RPPHS). Clin Chem 1994;40:934 – 8. Blirup-Jensen S, Just Svendsen P. A new reference preparation for proteins in human serum. Upsala J Med Sci 1994;99:251– 8. Flensted Lassen J, Kjeldsen J, Antonsen S, Hyltoft Petersen P, Brandslund I. Interpretation of serial measurements of International Normalized Ratio (INR) in monitoring oral anticoagulant treatment. A graphical combination of therapeutic interval and reference change. Clin Chem 1995;41:1171– 6. US Department of Health and Human Services. Medicare, Medicaid, and CLIA programs; regulations implementing the Clinical Laboratory Improvement Amendments of 1988 (CLIA). Final rule. Fed Regist 1992;57:7002–186. Ehrmeyer SS, Laessig RH, Leinweber JE, Oryall JJ. 1990 Medicare/CLIA final rules for proficiency testing: minimum intralaboratory performance characteristics (CV and bias) needed to pass. Clin Chem 1990:36:1736 – 40.