Download Accurate Estimation of Standard Deviations for

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
CLIN.CHEM. 21/13,
(1975)
Accurate 1935-1938
Estimation
of Standard Deviations for
Quantitative Methods Used in Clinical Chemistry
Robert W. Burnett
Although
the standard
deviation
is the most
measure of the precision of quantitative
widely
used
methods,
there
is a need to re-examine
the conditions
necessary
to obtain a meaningful
estimate
of this quantity. The importance of the material
to be sampled,
the sample size,
the calculation
of confidence
intervals, and the segregation of outliers are discussed.
AddItIonal Keyphrases:
statistics
#{149}
outliers
One important
index of the quality of clinical laboratory services
is the precision
associated
with each
of the quantitative
methods
in use in the laboratory.
The most common
measure
of precision
is the standard deviation,
defined as
In
1
/i=1
S =
(
V
/
-x
I j=1
(
(1)
=
1!
n-i
V
v
where n is the number
of observations,
v is the number of degrees of freedom,
x is an individual
observation, and
is the mean of n observations.
It is often
incorrectly
assumed
that standard
deviations
can be
used directly
to compare
precision-for
example,
to
compare
results by two methods
in a laboratory
or results from two laboratories.
In fact, a meaningful
comparison
of method
precision
is not possible if the
standard
deviation
for each method
is the only information available.
This paper reviews several factors that can bias estimates
of standard
deviation.
This will suggest how
more meaningful
estimates
of standard
deviation
can
be obtained,
and what additional
information
must
be available
if one wishes to compare the precision
of
two methods
in clinical chemistry.
Characteristics
of Material to Be Sampled
Standard
deviations
are most reliably
calculated
from the results of repeated
analysis of one lot of material. In many laboratories
this information
is accumulated
as part of a routine
internal
quality-control
program.
The following
points should always be observed in obtaining
such data:
i. Whenever
possible,
a pooi with the same matrix
as that of samples
from patients
should be used to
Clinical
Chemistry
Laboratory,
Hartford
Hospital,
Conn. 06115.
Presented
in part at the Ninth International
Congress
cal Chemistry,
Toronto,
Canada, July 14, 1975.
Received Aug. 11, 1975; accepted
Sept. 25, 1975.
Hartford,
on Clini-
gather data on precision,
even though it may be the
case that a constituent
can be measured
more precisely in aqueous
solution
than in a serum or urine
matrix.
2. If the pooled material
is purchased
in lyophilized form, one should have data from the manufacturer showing
that the inter-vial
variability
for all
constituents
is within
acceptable
limits.
The user
must also ensure that the lyophilized
material
is reconstituted
precisely,
and that the variability
from
this source does not contribute
significantly
to the
variability
of the test results.
3. For both liquid and lyophilized
pools, the user
must be sure that each sample
to be tested can be
traced to the same homogeneous
lot of material.
Detailed
information
as to how to prepare
and use
a serum pool for monitoring
precision
appears
in the
Selected
Methods
section
of the November
issue of
this journal (1).
For some methods
it will be found that the standard deviation
depends
strongly on the mean concentration
of the constituent
being analyzed.
A glucose
method
might have a standard
deviation
of 30 mg/
liter at a mean level of 1000 mg/liter,
but 50 mg/liter
at a concentration
of 2000 mg/liter.
In general,
it is
not a simple matter
to predict
how the standard
deviation will depend
on the mean; therefore,
it is always desirable
when reporting
a standard
deviation,
to specify the mean,
in equation
1, about which the
standard
deviation
was calculated.
Furthermore,
it is
not generally
possible
to compare
two standard
deviations unless the two mean values are nearly equal,
or unless the dependency
of standard
deviation
on
the mean is well known.
Another
problem
is obtaining
a standard
deviation
that truly reflects the precision
of the method
as applied to patients’
specimens
sometimes
arises if the
mean or target value for the control sample is known
to the operator.
In general, whenever
an operation
is
required
to estimate
a result, whether from a meter, a
graph, or a noisy digital display,
there will be a conscious or subconscious
bias in the direction
of the target value when this value is known. This naturally
results in estimates
of standard
deviation
that are artificially low. It may not be a simple matter
to always
use control materials
that are true unknowns.
Even a
pool without
an assigned
value will, if used daily,
quickly be assigned
a mean value in the minds of the
operators.
One approach
to this problem
is to disCUNICAL CHEMISTRY, Vol. 21, No. 13, 1975
1935
guise the control
material
as a patient’s
specimen;
again, this may be quite difficult
to do effectively.
In
another
approach,
several different
pools are tested
in a random sequence
(see ref. 1).
Sample
and Confidence
Size
Confidence
Intervals
It is usually
desired to have a measured
standard
deviation
reflect the long-term
precision
with which
patients’
specimens
are analyzed.
Accordingly,
the
standard
deviation
should
be calculated
from data
obtained
during several days by several different
operators.
If the standard
deviation
is calculated
from
data obtained
in a single run or with a single operator, the value will usually be lower than the long-term
standard
deviation;
higher
values
are possible
in
theory.
The sample
size plays an important
role,
which is often given little attention,
in determining
the reliability
of a calculated
standard
deviation.
It must be remembered
that whenever
a standard
deviation,
s, is calculated
from a finite number of test
results, s is merely an estimate
of the true standard
deviation,
a, which
corresponds
to the population
of
an infinite
number
of test results.
As is true in all
such estimates,
s will be more likely to be close to a
as n, the sample
size, is increased.
Moreover,
for a
given n it is possible to determine
the accuracy of s at
a specified
confidence
level. Obviously,
it is important to have some idea of the accuracy
of estimated
values of a before making
decisions
based on these
values. The mathematical
formulation
of the problem is straightforward
and may be explained
as follows. We wish to know the magnitude
of error in our
estimate,
s, at a specified
confidence
level. That is,
we wish to know the value of u that satisfies
the inequality
(1
-
u)a
<.s <(1
at a specified
confidence
level.
has been given by Greenwood
who pointed
+ u)a
The
and
solution
to this
Sandomire
(2),
out that the above is equivalent
to
VS2
(2)
a
by simple
algebraic
The quantity
manipulation.
in the center
of this inequality is the
as xv2, and extensive ta
statistical
parameter
known
bles are available
(3) that list the
percentiles
of the
x2 distribution
for various degrees of freedom,
v. It is
assumed
that the sample
is from a population
for
which the values have a gaussian
distribution.
Once a confidence
level, P, is selected,
we may
write expressions
equivalent
to so-called
two-tailed
confidence
intervals
as follows:
X2v,(1-P)/2
< X2 <
(3)
X2p,(1+P)/2
at the confidence
level P, by definition.
Although
other intervals
could be chosen,
the one used here,
which cuts off equal areas at the ends of the distribution curve, is very close to the best choice for v >
about 20 (4). Comparison
1936
of equations
2 and 3 shows
CLINICAL CHEMISTRY, Vol. 21, No. 13, 1975
1. Percent Error (u X 100) Associated with
an Estimated Standard Deviation
Table
coefficient
(P)
.90
.95
.99
10
38%
45%
60%
20
30
26%
21%
31%
41%
25%
40
18%
22%
60
80
100
200
15%
18%
15%
14%
33%
28%
23%
400
6%
13%
11%
8%
20%
18%
10%
13%
7%
9%
that
v(i + u)2
= X2v,1+P/2
v(1
= X2v,1-P12
and
u)2
-
Either
equation
may now be solved for u, since almost the same value will be obtained.
This is true because, even though the x2 distribution
is asymmetrical, the distribution
of
is only slightly skewed.
The difference in the value of u calculated
from the
two equations
is not significant.
Thus
a
=
.t/X2v1+P/2
-
1
(4)
Example.
Assume that s has been calculated
from
31 determinations,
and we wish to determine
the
error associated
with this estimation
at the 95% confidence
level. For P = 0.95 and v = 30, a2,+p/
=
47.0. Solving equation
4 gives u = 0.25. Thus, at the
95% confidence
level, the error in s is less than 25%.
Expressed
another way, 0.75 a <s <1.25 a.
Table 1 gives a tabulation
of the percent
error associated
with an estimated
standard
deviation,
for
various
degrees
of freedom
and at different
confidence levels. Graphs of these functions
are also available (2, 5), which facilitate
interpolation.
Segregation
of Outliers
When interpreting
standard
deviations,
one customarily
makes the assumption
that all results have
come from a population
with a gaussian
distribution.
Even if only a relative
comparison
of two standard
deviations
is desired, this can only be obtained
if the
two populations
in question
have distributions
of
similar
shape. Although
it is often stated that random errors associated
with a quantitative
measurement are usually
distributed
in an approximately
gaussian
fashion,
the distribution
of raw data from
routine
analyses,
either of a pooled serum sample or
of a particular
patient’s
specimen,
usually does not
conform
to any well-defined
distribution;
in fact its
shape is usually not predictable
at all. It follows that
applying
statistical
analysis,
such as the calculation
of means and standard
deviations,
to raw data may
yield results that are easily misinterpreted.
Table 2. Criteria for Outlier Identification
for
Various Sample Sizes, with Use of the Definition
ms < x0 <
ms and a 95% Confidence Level
-
m
n
134
34
143
SOOIL
CONCENTRATION
147 MI
10150151
13410
(Mss/L)
Fig. 1. Distribution of serum sodium values obtained from our
internal quality-control program. See text for detailed explanation
10
2.80
20
30
40
60
80
3.02
100
3.47
120
3.52
3.58
3.66
3.76
3.83
3.14
3.22
3.33
3.41
150
Figure
1 shows the distribution
of serum sodium
values
obtained
from the internal
quality-control
program
in the clinical chemistry
laboratory
at Hartford Hospital.
A serum pool was analyzed
once each
day during routine
processing
of patients’
specimens
and the actual
distribution
obtained
over a fourmonth period is shown by the solid line in Figure 1
and the three results corresponding
to the solid black
rectangles.
The standard
deviation
calculated
from
all data points
is 2.30 and the calculated
mean is
149.5 mmol/liter.
The gaussian
distribution
defined
by this mean
and standard
deviation
is shown by the dashed
line
in Figure 1. It is apparent
that this is a gross misrepresentation
of the actual distribution
of values.
In
this situation
the calculated
standard
deviation
conveys no meaningful
information
about the precision
of the method and in fact is quite misleading.
The problem,
of course, is that while most of the
results are clustered
about the mean, the results represented
by the three black rectangles
lie far away
from the mean and are heavily biasing the standard
deviation,
The problem
can be resolved
by recognizing that these outlying
values usually
result
from
careless mistakes
such as picking up the wrong pipet,
accidentally
interchanging
specimen
tubes, or transposing two digits in a result transcription,
e.g., 193
for 139. As such, they belong to a different
population distribution
than the set of results
clustered
more tightly
about the mean, which represents
the
inherent
precision
of the measurement
technique
itself.
It must be realized
that the frequency
of occurrence of the type of error that leads to outlying
results is itself an important
measure
of the overall
quality
of clinical
chemistry
service.
A meaningful
measure
of the precision
of any quantitative
method
must include both the inherent
precision
of the measurement
technique
and the outlier frequency.
All that is necessary
to obtain an estimate
of both
these quantities
is to adopt a criterion
for identification of outlying
results
in order to segregate
them
from the rest of the data. Many methods
have been
used for this purpose;
all are somewhat
arbitrary.
However,
useful results will be obtained
if one method is adopted
and used consistently.
The criterion
200
300
400
used in our laboratory
is a modification
of one given
by Natrella
(6), which assumed
that estimates
of the
mean and standard
deviation
are available.
An outlier may be defined as a result, Xo, which lies
further
than some multiple,
m, of standard
deviations from the mean; that is x0 <x
ins or X0 >
+
ms. The value of m to be used depends
on the number of results in the sample. If a 5% risk of incorrectly
identifying
a result as an outlier is accepted,
m may
be calculated
for any given number
of results,
n.
Table 2 lists several
such values calculated
by the
method given by Natrella.
To apply this technique
to quality-control
data in
our laboratory,
the following steps are followed:
1. the mean and standard
deviation
are calculated
including
all results
2. results
more than m standard
deviations
from
the mean are segregated
3. a new mean and standard
deviation
are calculated from the remaining
results
4. results more than m times the new standard
deviation away from the new mean are segregated
5. the process
is repeated
until no more outliers
are found
6. the number
of outliers
is divided
by the total
number of results to give the outlier frequency.
This iterative
technique
is most conveniently
done
with the aid of a computer.
If one already
had at
hand reliable
estimates
of standard
deviation
(excluding outliers),
then the Natrella
criterion
could be
applied
without
the modification
of multiple
iterations. However, the iterative technique is useful when
no such estimate
is available.
For the data in Figure 1, two iterations
resulted
in
the three results corresponding
to the solid black rectangles being identified
as outliers.
The standard
deviation
calculated
without
these points
is 1.07 and
the outlier frequency
is 2.5%. The gaussian
distribution corresponding
to the new standard
deviation
is
shown by the dotted
curve, which is seen to fit the
observed
distribution
quite well.
-
CUNICAL CHEMISTRY, Vol. 21, No. 13, 1975
1937
Table 3. Summary
of Method-Precision
Meana
Test (units)
Glucose (mg/I)
Urea (mg/I)
Creatinine(mg/I)
Sodium (mmol/I)
Potassium (mmol/l)
Chloride (mmol/l)
Osmolality(mOsm/I)
1140
260
19
146
5.4
103
Calcium
2.16
107
21
10
204
1860
1160
318
(mmol/l)
Alkaline phosphatase (U/I)
Aspartate aminotransferase (U/I)
Alanineaminotransferase
(U/I)
Lactatedehydrogersase
(U/I)
Cholesterol(mg/I)
Triglycerides
(mg/I)
aAll numbers
Hospital.
shown
Statistics
are mean values of data from
2.5
#{149}
.o
.3434
-
2
3
6ELATIYE
9
5
6
SISINAD reviATlal (I)
Fig. 2. CorrelatIon between outlier frequency and relative
standard deviation. Data taken from quality-control program
over a three-year period
It is now possible to make a meaningful
statement
about the precision
of the serum sodium
method,
by
use of the quality-control
data of Figure 1. s is 1.07 at
a mean of 150 mmol/liter
and with an outlier
frequency of 2.5%; n is 121, which implies that s is accurate to within 13% of a at the 95% confidence level.
Discussion
When the above criteria for determination
of standard deviations
are used in conjunction
with an internal
quality-control
program
in the laboratory,
meaningful
estimations
of precision
for the various
quantitative
methods
can be easily made. Table 3
presents data on long-term precision of 14 common
chemical determinations.
The data were gathered
from the internal quality-control
program in our laboratory (1) over a three-year
period and are for a
blind serum pool. All values in the table are averages
of nine separate determinations,
each of which was
calculated from 80 to 120 individual
test results after
1938
CLINICAL CHEMISTRY, Vol. 21, No. 13, 1975
SD
29
8.9
0.87
1.4
0.079
2.1
3.9
0.040
4.9
1.9
1.7
12
94
96
nine separate four-month
3,0
1
from Three Years of Quality-Control
CV, %
periods.
n
Data
Outlier
frequency,
%
2.5
120
2.1
3.4
4.6
120
120
1.4
0.6
0.95
120
2.8
1.5
2.0
1.2
1.9
4.6
9.2
17
6.1
5.1
8.3
120
120
120
100
120
120
120
120
80
80
2.2
1.5
2.6
2.0
0.2
0.2
0.1
0.3
0.3
0.3
Data from Clinical
Chemistry
Laboratory,
Hartford
segregation
of outliers.
The data for each test are
thus derived from roughly
1000 measurements
made
during the three-year
period.
One final observation
of interest
is illustrated
by
Figure 2, which shows a high degree of correlation
between outlier frequency
and the coefficient
of variation (relative
standard
deviation)
of the various tests.
The data are taken from Table 3. The figure indicates that the tests with the lowest relative standard
deviation
(corrected
for outliers
by using the criterion described
above) show the highest outlier frequency. While those tests with a relative
standard
deviation >5% show a very low and relatively
constant
outher frequency
of 0.1-0.3%,
the most precise tests in
the laboratory,
with relative
standard
deviations
around 1%, show outlier frequencies
of 2.5-3.0%. The
origin of this effect and possible
means of reducing
outlier frequency,
particularly
for the relatively
precise tests, deserve
further
study.
These
and other
studies
will require
accurate
estimates
of method
precision,
which in turn requires
that s, , n, and
outliner
frequency
all be measured
and reported,
for
the interpretation
of the data to be most meaningful.
References
1. Bowers, G. N., Jr., Burnett,
R. W., and McComb,
R. B., Preparation and use of human
serum control
materials
for monitoring
precision
in clinical chemistry.
Clin. Chem. 21,1830 (1975).
2. Greenwood,
J. A.,and Sandomire, M. M., Sample size required
for estimating
the standard
deviation
as a per cent of its true
value. J. Am. Stat. Assoc. 45,257 (1950).
3. Thompson,
C. M., Table of percentage
points of the x2 distribution. Biometrika
32, 188 (1941). (Reproduced
in most standard
statistics
textbooks).
4. Bennett,
C. A. and Franklin,
N. L., Statistical
Analysis
in
Chemistry
and the Chemical
Industry,
John Wiley and Sons, Inc.,
New York, N. Y., 1954, p 173.
5. Natrella,
Standards
pp 2-12.
6. Natrella,
M. G., Experimental
Statistics,
Handbook
91, U. S. Government
M. G., ibid.,
pp 17-4.
National
Bureau
of
Printing
Office, 1963,