Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CLIN.CHEM. 21/13, (1975) Accurate 1935-1938 Estimation of Standard Deviations for Quantitative Methods Used in Clinical Chemistry Robert W. Burnett Although the standard deviation is the most measure of the precision of quantitative widely used methods, there is a need to re-examine the conditions necessary to obtain a meaningful estimate of this quantity. The importance of the material to be sampled, the sample size, the calculation of confidence intervals, and the segregation of outliers are discussed. AddItIonal Keyphrases: statistics #{149} outliers One important index of the quality of clinical laboratory services is the precision associated with each of the quantitative methods in use in the laboratory. The most common measure of precision is the standard deviation, defined as In 1 /i=1 S = ( V / -x I j=1 ( (1) = 1! n-i V v where n is the number of observations, v is the number of degrees of freedom, x is an individual observation, and is the mean of n observations. It is often incorrectly assumed that standard deviations can be used directly to compare precision-for example, to compare results by two methods in a laboratory or results from two laboratories. In fact, a meaningful comparison of method precision is not possible if the standard deviation for each method is the only information available. This paper reviews several factors that can bias estimates of standard deviation. This will suggest how more meaningful estimates of standard deviation can be obtained, and what additional information must be available if one wishes to compare the precision of two methods in clinical chemistry. Characteristics of Material to Be Sampled Standard deviations are most reliably calculated from the results of repeated analysis of one lot of material. In many laboratories this information is accumulated as part of a routine internal quality-control program. The following points should always be observed in obtaining such data: i. Whenever possible, a pooi with the same matrix as that of samples from patients should be used to Clinical Chemistry Laboratory, Hartford Hospital, Conn. 06115. Presented in part at the Ninth International Congress cal Chemistry, Toronto, Canada, July 14, 1975. Received Aug. 11, 1975; accepted Sept. 25, 1975. Hartford, on Clini- gather data on precision, even though it may be the case that a constituent can be measured more precisely in aqueous solution than in a serum or urine matrix. 2. If the pooled material is purchased in lyophilized form, one should have data from the manufacturer showing that the inter-vial variability for all constituents is within acceptable limits. The user must also ensure that the lyophilized material is reconstituted precisely, and that the variability from this source does not contribute significantly to the variability of the test results. 3. For both liquid and lyophilized pools, the user must be sure that each sample to be tested can be traced to the same homogeneous lot of material. Detailed information as to how to prepare and use a serum pool for monitoring precision appears in the Selected Methods section of the November issue of this journal (1). For some methods it will be found that the standard deviation depends strongly on the mean concentration of the constituent being analyzed. A glucose method might have a standard deviation of 30 mg/ liter at a mean level of 1000 mg/liter, but 50 mg/liter at a concentration of 2000 mg/liter. In general, it is not a simple matter to predict how the standard deviation will depend on the mean; therefore, it is always desirable when reporting a standard deviation, to specify the mean, in equation 1, about which the standard deviation was calculated. Furthermore, it is not generally possible to compare two standard deviations unless the two mean values are nearly equal, or unless the dependency of standard deviation on the mean is well known. Another problem is obtaining a standard deviation that truly reflects the precision of the method as applied to patients’ specimens sometimes arises if the mean or target value for the control sample is known to the operator. In general, whenever an operation is required to estimate a result, whether from a meter, a graph, or a noisy digital display, there will be a conscious or subconscious bias in the direction of the target value when this value is known. This naturally results in estimates of standard deviation that are artificially low. It may not be a simple matter to always use control materials that are true unknowns. Even a pool without an assigned value will, if used daily, quickly be assigned a mean value in the minds of the operators. One approach to this problem is to disCUNICAL CHEMISTRY, Vol. 21, No. 13, 1975 1935 guise the control material as a patient’s specimen; again, this may be quite difficult to do effectively. In another approach, several different pools are tested in a random sequence (see ref. 1). Sample and Confidence Size Confidence Intervals It is usually desired to have a measured standard deviation reflect the long-term precision with which patients’ specimens are analyzed. Accordingly, the standard deviation should be calculated from data obtained during several days by several different operators. If the standard deviation is calculated from data obtained in a single run or with a single operator, the value will usually be lower than the long-term standard deviation; higher values are possible in theory. The sample size plays an important role, which is often given little attention, in determining the reliability of a calculated standard deviation. It must be remembered that whenever a standard deviation, s, is calculated from a finite number of test results, s is merely an estimate of the true standard deviation, a, which corresponds to the population of an infinite number of test results. As is true in all such estimates, s will be more likely to be close to a as n, the sample size, is increased. Moreover, for a given n it is possible to determine the accuracy of s at a specified confidence level. Obviously, it is important to have some idea of the accuracy of estimated values of a before making decisions based on these values. The mathematical formulation of the problem is straightforward and may be explained as follows. We wish to know the magnitude of error in our estimate, s, at a specified confidence level. That is, we wish to know the value of u that satisfies the inequality (1 - u)a <.s <(1 at a specified confidence level. has been given by Greenwood who pointed + u)a The and solution to this Sandomire (2), out that the above is equivalent to VS2 (2) a by simple algebraic The quantity manipulation. in the center of this inequality is the as xv2, and extensive ta statistical parameter known bles are available (3) that list the percentiles of the x2 distribution for various degrees of freedom, v. It is assumed that the sample is from a population for which the values have a gaussian distribution. Once a confidence level, P, is selected, we may write expressions equivalent to so-called two-tailed confidence intervals as follows: X2v,(1-P)/2 < X2 < (3) X2p,(1+P)/2 at the confidence level P, by definition. Although other intervals could be chosen, the one used here, which cuts off equal areas at the ends of the distribution curve, is very close to the best choice for v > about 20 (4). Comparison 1936 of equations 2 and 3 shows CLINICAL CHEMISTRY, Vol. 21, No. 13, 1975 1. Percent Error (u X 100) Associated with an Estimated Standard Deviation Table coefficient (P) .90 .95 .99 10 38% 45% 60% 20 30 26% 21% 31% 41% 25% 40 18% 22% 60 80 100 200 15% 18% 15% 14% 33% 28% 23% 400 6% 13% 11% 8% 20% 18% 10% 13% 7% 9% that v(i + u)2 = X2v,1+P/2 v(1 = X2v,1-P12 and u)2 - Either equation may now be solved for u, since almost the same value will be obtained. This is true because, even though the x2 distribution is asymmetrical, the distribution of is only slightly skewed. The difference in the value of u calculated from the two equations is not significant. Thus a = .t/X2v1+P/2 - 1 (4) Example. Assume that s has been calculated from 31 determinations, and we wish to determine the error associated with this estimation at the 95% confidence level. For P = 0.95 and v = 30, a2,+p/ = 47.0. Solving equation 4 gives u = 0.25. Thus, at the 95% confidence level, the error in s is less than 25%. Expressed another way, 0.75 a <s <1.25 a. Table 1 gives a tabulation of the percent error associated with an estimated standard deviation, for various degrees of freedom and at different confidence levels. Graphs of these functions are also available (2, 5), which facilitate interpolation. Segregation of Outliers When interpreting standard deviations, one customarily makes the assumption that all results have come from a population with a gaussian distribution. Even if only a relative comparison of two standard deviations is desired, this can only be obtained if the two populations in question have distributions of similar shape. Although it is often stated that random errors associated with a quantitative measurement are usually distributed in an approximately gaussian fashion, the distribution of raw data from routine analyses, either of a pooled serum sample or of a particular patient’s specimen, usually does not conform to any well-defined distribution; in fact its shape is usually not predictable at all. It follows that applying statistical analysis, such as the calculation of means and standard deviations, to raw data may yield results that are easily misinterpreted. Table 2. Criteria for Outlier Identification for Various Sample Sizes, with Use of the Definition ms < x0 < ms and a 95% Confidence Level - m n 134 34 143 SOOIL CONCENTRATION 147 MI 10150151 13410 (Mss/L) Fig. 1. Distribution of serum sodium values obtained from our internal quality-control program. See text for detailed explanation 10 2.80 20 30 40 60 80 3.02 100 3.47 120 3.52 3.58 3.66 3.76 3.83 3.14 3.22 3.33 3.41 150 Figure 1 shows the distribution of serum sodium values obtained from the internal quality-control program in the clinical chemistry laboratory at Hartford Hospital. A serum pool was analyzed once each day during routine processing of patients’ specimens and the actual distribution obtained over a fourmonth period is shown by the solid line in Figure 1 and the three results corresponding to the solid black rectangles. The standard deviation calculated from all data points is 2.30 and the calculated mean is 149.5 mmol/liter. The gaussian distribution defined by this mean and standard deviation is shown by the dashed line in Figure 1. It is apparent that this is a gross misrepresentation of the actual distribution of values. In this situation the calculated standard deviation conveys no meaningful information about the precision of the method and in fact is quite misleading. The problem, of course, is that while most of the results are clustered about the mean, the results represented by the three black rectangles lie far away from the mean and are heavily biasing the standard deviation, The problem can be resolved by recognizing that these outlying values usually result from careless mistakes such as picking up the wrong pipet, accidentally interchanging specimen tubes, or transposing two digits in a result transcription, e.g., 193 for 139. As such, they belong to a different population distribution than the set of results clustered more tightly about the mean, which represents the inherent precision of the measurement technique itself. It must be realized that the frequency of occurrence of the type of error that leads to outlying results is itself an important measure of the overall quality of clinical chemistry service. A meaningful measure of the precision of any quantitative method must include both the inherent precision of the measurement technique and the outlier frequency. All that is necessary to obtain an estimate of both these quantities is to adopt a criterion for identification of outlying results in order to segregate them from the rest of the data. Many methods have been used for this purpose; all are somewhat arbitrary. However, useful results will be obtained if one method is adopted and used consistently. The criterion 200 300 400 used in our laboratory is a modification of one given by Natrella (6), which assumed that estimates of the mean and standard deviation are available. An outlier may be defined as a result, Xo, which lies further than some multiple, m, of standard deviations from the mean; that is x0 <x ins or X0 > + ms. The value of m to be used depends on the number of results in the sample. If a 5% risk of incorrectly identifying a result as an outlier is accepted, m may be calculated for any given number of results, n. Table 2 lists several such values calculated by the method given by Natrella. To apply this technique to quality-control data in our laboratory, the following steps are followed: 1. the mean and standard deviation are calculated including all results 2. results more than m standard deviations from the mean are segregated 3. a new mean and standard deviation are calculated from the remaining results 4. results more than m times the new standard deviation away from the new mean are segregated 5. the process is repeated until no more outliers are found 6. the number of outliers is divided by the total number of results to give the outlier frequency. This iterative technique is most conveniently done with the aid of a computer. If one already had at hand reliable estimates of standard deviation (excluding outliers), then the Natrella criterion could be applied without the modification of multiple iterations. However, the iterative technique is useful when no such estimate is available. For the data in Figure 1, two iterations resulted in the three results corresponding to the solid black rectangles being identified as outliers. The standard deviation calculated without these points is 1.07 and the outlier frequency is 2.5%. The gaussian distribution corresponding to the new standard deviation is shown by the dotted curve, which is seen to fit the observed distribution quite well. - CUNICAL CHEMISTRY, Vol. 21, No. 13, 1975 1937 Table 3. Summary of Method-Precision Meana Test (units) Glucose (mg/I) Urea (mg/I) Creatinine(mg/I) Sodium (mmol/I) Potassium (mmol/l) Chloride (mmol/l) Osmolality(mOsm/I) 1140 260 19 146 5.4 103 Calcium 2.16 107 21 10 204 1860 1160 318 (mmol/l) Alkaline phosphatase (U/I) Aspartate aminotransferase (U/I) Alanineaminotransferase (U/I) Lactatedehydrogersase (U/I) Cholesterol(mg/I) Triglycerides (mg/I) aAll numbers Hospital. shown Statistics are mean values of data from 2.5 #{149} .o .3434 - 2 3 6ELATIYE 9 5 6 SISINAD reviATlal (I) Fig. 2. CorrelatIon between outlier frequency and relative standard deviation. Data taken from quality-control program over a three-year period It is now possible to make a meaningful statement about the precision of the serum sodium method, by use of the quality-control data of Figure 1. s is 1.07 at a mean of 150 mmol/liter and with an outlier frequency of 2.5%; n is 121, which implies that s is accurate to within 13% of a at the 95% confidence level. Discussion When the above criteria for determination of standard deviations are used in conjunction with an internal quality-control program in the laboratory, meaningful estimations of precision for the various quantitative methods can be easily made. Table 3 presents data on long-term precision of 14 common chemical determinations. The data were gathered from the internal quality-control program in our laboratory (1) over a three-year period and are for a blind serum pool. All values in the table are averages of nine separate determinations, each of which was calculated from 80 to 120 individual test results after 1938 CLINICAL CHEMISTRY, Vol. 21, No. 13, 1975 SD 29 8.9 0.87 1.4 0.079 2.1 3.9 0.040 4.9 1.9 1.7 12 94 96 nine separate four-month 3,0 1 from Three Years of Quality-Control CV, % periods. n Data Outlier frequency, % 2.5 120 2.1 3.4 4.6 120 120 1.4 0.6 0.95 120 2.8 1.5 2.0 1.2 1.9 4.6 9.2 17 6.1 5.1 8.3 120 120 120 100 120 120 120 120 80 80 2.2 1.5 2.6 2.0 0.2 0.2 0.1 0.3 0.3 0.3 Data from Clinical Chemistry Laboratory, Hartford segregation of outliers. The data for each test are thus derived from roughly 1000 measurements made during the three-year period. One final observation of interest is illustrated by Figure 2, which shows a high degree of correlation between outlier frequency and the coefficient of variation (relative standard deviation) of the various tests. The data are taken from Table 3. The figure indicates that the tests with the lowest relative standard deviation (corrected for outliers by using the criterion described above) show the highest outlier frequency. While those tests with a relative standard deviation >5% show a very low and relatively constant outher frequency of 0.1-0.3%, the most precise tests in the laboratory, with relative standard deviations around 1%, show outlier frequencies of 2.5-3.0%. The origin of this effect and possible means of reducing outlier frequency, particularly for the relatively precise tests, deserve further study. These and other studies will require accurate estimates of method precision, which in turn requires that s, , n, and outliner frequency all be measured and reported, for the interpretation of the data to be most meaningful. References 1. Bowers, G. N., Jr., Burnett, R. W., and McComb, R. B., Preparation and use of human serum control materials for monitoring precision in clinical chemistry. Clin. Chem. 21,1830 (1975). 2. Greenwood, J. A.,and Sandomire, M. M., Sample size required for estimating the standard deviation as a per cent of its true value. J. Am. Stat. Assoc. 45,257 (1950). 3. Thompson, C. M., Table of percentage points of the x2 distribution. Biometrika 32, 188 (1941). (Reproduced in most standard statistics textbooks). 4. Bennett, C. A. and Franklin, N. L., Statistical Analysis in Chemistry and the Chemical Industry, John Wiley and Sons, Inc., New York, N. Y., 1954, p 173. 5. Natrella, Standards pp 2-12. 6. Natrella, M. G., Experimental Statistics, Handbook 91, U. S. Government M. G., ibid., pp 17-4. National Bureau of Printing Office, 1963,