Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Determination of Normal Ranges from Routine Laboratory Data George J. Neumann A method is described which is potentially capable of closely estimating the normal range from laboratory data. The estimation is made on probability paper using a purposely truncated form of the “normal” distribution. A fictitious set of data has been used to illustrate the efficiency of estimation of normals. The method has been used to estimate the normal range of blood urea. HE NEED FOR A SIMPLE AND RELIABLE METHOD for determining normal ranges is widely recognized. A number of methods have been suggested for this purpose (1-5); however, to date none has been shown to be completely satisfactory when treating data from a heterogenous population such as is obtained in the hospital laboratory as daily routine procedure. This paper discusses a method which has the potential of treating such data, and describes a modification which will improve the results of the procedure. Probit Analysis The method of Hoffman (1) for deriving normal ranges from laboratory data is a simplified form of a more mathematic treatment called probit analysis. In his book, Finney (6) gives a short history of the probit method, which dates back to a suggestion by Fechner in 1860, progresses to the normal equivalent deviation (N.E.D.) of Gaddum in 1933, and Fisher’s maximum likelihood analysis in 1935. Although probit analysis has had many years in which to be used and developed, relatively little has been done in connection with heterogenous distributions. The probit (normal) From is a unit distribution the Received Ellis Hospital for publication defined in terms of the standardized curve. Laboratory, Sept. 27, Schenectady, 1967; accepted 979 N. Y. 12308. for publication Feb. 16, 1968. Gaussian 980 r11110 NEUMANN equation for the Gaussian p If a new unit, n, is defined = distribution into Equation equation equation (2) , the result distributions is the N.E.1). (3) for the standardized for the cumulative seen that is: -1--e7 I ardized (1) = (1) P it is readily is: as: = which is the corresponding Chemistry e - a and substituted Clinical normal distribution The j.5 J- /2,r e (4) 2 the mean and standard 0 and 1, respectively. are distribution. is then: deviation The of the stand- parameter 1(2 2 of Gaddum. Since it is more is defined as convenient to work Pi’obiC = with positive U + 5 numbers, the probit (5) When probits are plotted against values for a cumulative normal distribution, the result is a straight line. The effect of the probit transformation is shown in Fig. 1. Since the 0 arid 100 probabilities correspond to oc and + cc, respectively, these points do not normally fall on the probit line in practical plotting. It should be pointed out that in probit plotting the mean occurs at a probit of 5 (sO%), and the slope of the line is the reciprocal of the standard deviation. This will be true only if the values are normally distributed or can be transformed to a normally distributed function. The use of probits is of definite advantage when one needs mathematic accuracy, since the units may be treated as relating to a straight line. However, the cost in time and complexity seems not worth the investmerit unless computers are available. Therefore, the mathematic methods of probit analysis will be passed over in favor of the graphical estimates which are more economical of time. Those interested in the probit calculations are referred to Finney’s book (6). - vol. 14, No. 10, 1968 NORMAL 981 RANGES Estimations from Probability Paper Probability paper (No. 468000) * is graduated according to a Gaussian probability distribution in such a way that percentages may be plotted as their corresponding normal deviates in much the same way as one uses semilogarithmic graph paper to plot numbers as their corre- I-. ‘C’ 4 C” 2 Fig. 3 1. Effect sponding logarithms. distribution is plotted of probability paper formation. Of course, straight-line statistics To use probability or groups of values in the order is increasing in order of increasing *Keufel and Esser 4 5 of probit 6 7 transformation. 5 9 See text /0 for /1 /2 details. When the cumulative percentage of a Gaussian against value, a straight line will result. The use thus avoids the need for tables of probit transthe use of the paper makes it impossible to derive without returning to the use of probits. paper, it is necessary to list the possible values consecutive order. It makes no difference whether or decreasing, but it seems conventional to list value. The frequency of occurrence is then noted Co., Cleveland, Ohio 982 NEUMANN for each value, as well as the cumulative Clinical frequency, Chemistry and the cumulative percentage is calculated for each value. Table 1 (lemonstrates the i)rocedure for data representing a fictitious, Ilonhonlogenous population. The cumulative percentage is then plotted against tile value on “normal’’ probability paper (Fig. 2). Note that the composite curve obtained in Fig. 2 is not a straight line due to the fact that the data are derived from two overlapping Gaussian distributions. The exact form of tile curve will depend upon (1) the distance between the means, (2) the standard deviation of each distribution, and (3) the relative proportions of the distributions. if the normal range of Curve B is evaluated by tile method of Hoffman (1), extending tile best straight line to intersect the 5 and 93% points (90% limits, as suggested by Hoffman), we would obtain 18.549.0, as compared to the original 13.5-42.0. if, on the other hand, one were to use tile limits of tile straight-line portion of this curve as suggested by Waid (2), the lower limit of tile range would l)e equal to or less than 10, and the upper limit equal to 30 or perhaps 35, depending on exactly where tile straight line is terminated. Thus, it can be seen that these methods are at best of limited usefulness as they stand. If it were possible, however, to dissect the mixed distribution in such a way as to restore the original plots, the method should be more accurate and more appliuitble. Hoffman (1) alludes to such a technic in Hald (7). Dissection by this technic involves fitting a parabolic curve to the logarithmic form of the equation for the Gaussian curve. There are two objections to this: (1) the arithmetic is cumbersome, and (2) the technic is useful only when one side of one of the distributions is essentially unaffected by tile presence of the other. Tt would be much more desirable to have a simpler technic, preferably one involving no more cumbersome arithmetic than that encountered Table 1. I) FOR Two THEIR Cumulati, B C 68 1370 6190 9500 9984 - 9 85 500 1850 4400 7200 9050 - 9800 - 9972 Value 10 20 30 40 50 6() 70 80 90 e incidence - ARBITRARY COMBINED Cumulative combined incidence 77 1455 6600 11350 14384 17200 19050 19800 19972 (B AND C) POPULATIONS FREQUENCY CalcuMed Cumul incidence at/re % B 0.4 7.3 33.0 56.7 71.9 86.0 95.2 99.0 99.86 C 7 70 1460 6600 9680 0 0 1670 9994 4354 - - 7200 9050 9500 9972 Vol. 14, No. 10, 1968 NORMAL RANGES 983 with the fitting of a straight line, and one which is independent of the degree of overlapping of the constituent populations. Such a technic is implied by Hald’s discussion of the truncated normal distribution (7). The truncated distribution is a normally distributed population which has 1)een cut off at some point so that the sample is an incomplete population. 0 f 0 01 / / ‘C’ 01 I... ‘:3 ‘C’ C” IC’ / I /0 20 I I I 30 40 50 I_ 60 _I 70 I 90 80 UNI 7S Fig. 2. Probability plot of fictitious open circles; Curve C, haif.closed according to Hoffmann (1), closed nomihomogenous circles; circles. composite distribution curve, unbroken (see Table line; 1). Curve extrapolated B, line NEUMANN 984 Clinical Chemistry The effect of truncation on the probability plot is shown in Fig. 3. It caii be seen that the curve of the truncated distribution asymptotically approaches the value at which the distribution is truncated. The curve is derived from the values of Column 4 in Table 2, where the point of truncation is 30 units. An estimate of the degree of truncation is obtained by extending the best straight line to this value and reading the /G / 0#{149} 01 e .8 / / / /( 0 30 I I 40 50 60 I I 70 80 90 UN! TS Fig. circles; closed 3. Dissection second circles; cycle, ut overlapping half-closed reconstructed populations. circles, Curve vertically C, half-closed First split; circles, cycle fourth (truncated cycle, horizontally distribution), (reconstructed split. open Curve B), vol. 14, No. 10 196$ NORMAL Table 2. 1 )ISsscTIoN (IF 985 RANGES OvERL.\PI’ING i’oI’UL.TIoNs Cycle Value CumulI,ti,’e incidence 10 20 30 40 50 60 70 80 90 77 1455 6600 11350 14384 17200 19050 19800 19972 -- l’runcatc,l cumulative Cumulative % 3 4 0.9 17.0 0.8 15.4 0.7 14.6 77.0 70.0 66.0 5 1.16 22.0 100 04 7.3 33.0 567 71.9 86.0 95.2 99.0 99.86 ---- percent represented by the point of intersection (Point A, Fig. 3). The calculation to this point is the first cycle referred to below. This estimate can then be used to reconstruct the original distribution as described below. Method It is not known at this point how many values must be used to obtain statistically significant results. It is suggested that not less than 200 values be collected, and more would be preferable. The values are sorted as described above and the cumulative percentages obtained, including all values. The results are then plotted on probability paper which will yield a curve similar to that represented by the unbroken line in Fig. 2. A straight line is fitted by eye through those points obviously representing a straight line (closed circles, Fig. 2). One then selects the values closest to the limits of the straight line as the points of truncation. In the present instance, there is only one point of truncation at 30 units. The cumulative percent for each value between the points of truncation is recalculated using the total cumulative frequency of the included values as the total frequency of the sample (Table 2, first four columns). The resulting data are plotted on probability paper yielding a truncated curve (open circles, Fig. 3). A straight line is visually fitted to these points and extended to the point of truncation (Point A, Fig. 3). The percentage read from the point of intersection can then be used to calculate an estimate of the total number of observations (N) in this sample from this population. In the present example the calculation is as follows: 6600 = - - = .16 The figure thus derived can then be used in a second cycle of calcula- 986 NEUMANN tions to calculate new percentages of N. Should tile point of truncation tribution, the difference between of N is added to the incidence percentages. Clinkal alid to plot arid make a new Chemistry estilnate occur at the lower values of the disthe observed and the calculated values of each value before calculating new The procedure as described for the first cycle can be applied as often a sufficiently straight line. I have arbitrarily chosen to repeat until the limiting value deviates less than 2 percentage units from the value obtained. When a sufficiently straight line has been obtained the values corresponding to 2.5 and 97.5% are read from the graph. These are the 95% limits of the normal range; the mean is read at the 50% point on tile drawn line. The standard deviation may be estimated by subtracting tile low’er limit of normal from the upper limit of normal and dividing by four. to obtain as necessary The application of this method to tile data of Table 1 are shown in Table 2 and Fig. 3. The last two columns of Table 1 show the calculated incidence of each value for the two distributions. The reconstructed curves agree favorably with the originals. The results of the applicahon are compared in Table 3 with the original, Hoffman’s interpretation, and Waid ‘s interpretation. Application of Method of Blood Urea The method has been applied to a series of 626 urea values obtained by the routine AutoAllalyzer method. Values were grouped using multiples of five as midpoints, with a class interval of 5 units. This grouping was chosen 011 the basis of the standard deviation of the analysis. The original plot and the reconstructed normal curve are shown in Fig. 4. The normal range as calculated from the data is 8-20 mg./100 ml. These values agree well with those quoted by Henry (8) for the overall population. Preliminary data on differences between age groups and sexes tend to confirm previous data quoted by Henry. Table Method Original Hoffman Waid(2) Present * The lower limit 3. CoMPRIsoN Curve (1) can he shown OF “NoRM.I” VALUES Curve B 13.5 -42.5 18.5 -49.0 10.0*30.0 13.5 -41.0 to be lens than 5 if more C 26.0-79.0 17.5-69.0 60.0-90.0 26.0-79.0 complete data are used (see Fig. 2). Vol. 14, No. 10, 196$ NORMAL RANGES 987 oO 00 00 0 01 0 00 OO 0 0 0 0 0 01 I I /0 20 I I 30 40 - I I I I 50 60 70 80 I 90 UREA (mg/lOO ml) Fig. siormal 4. Determination of normal distribution, closed circles. values: urea. Original plot, open circles; reconstructed Summary A method has been described which is potentially capable of closely estimating the normal range from laboratory data. The estimation is made on probability paper using a purposely truncated form of the normal (Gaussian) distribution. The oi)jections to previously published methods have been overcome, and no more complicated calculations than the calculation of percentages are necessary. A fictitious set of data has been used to illustrate the efficiency of estimation of normals. rflle method has also been used to estimate the normal range of blood urea. Tile method is applicable only when tile distribution is Gaussian or can be transformed to a Gaussian distribution (e.g., loguormal). 988 NEUMANN Clinical Chemistry References I. 2. Hoffman, R. G., Statistics in the practice of medicine. J. Am. Med. Assoc. 185, 864 (1963). Waid, M., Quoted by Sparapani, A., and Berry, R. E., The range of normal values in the quality control of clinical chemistry. Am. J. Clin. Pathol. 42, 133 (1964). 3. Herrera, L., The precision of percentiles in establishing normal limits in medicine. J. Lab. Clin. Med. 52, 34 (1958). 4. Henry, R.,J., Clinical Chemistry: Principles and Technics. Hoeher, New York, 1964, p. 147. 5. Henry, 11. J., and Dryer, R. L., Standard Methods of Clinical Chemistry (Vol. 4). Acad. Press, New York, 1963, p. 205. 6. Finney, D. J., Probit Analysis: A Statistical Treatment of the Sigmoid Response Curve (ed. 2). Cambridge Univ. Press, Cambridge, England, 1962. 7. Hald, A., Statistical Theory with Engineering Applications. Wiley, New York, 1962. 8. Henry, B. J., Clinical Chemistry: Principles and Technics. Hoeber, New York, 1964, p. 275.