Download Quantitative Exposure Data: Interpretation, Decision Making, and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Quantitative Exposure Data:
Interpretation, Decision Making,
and Statistical Tools
Purpose of Exposure Assessment
• To decide two things:
– Is the SEG’s exposure profile (exposure and its variability) adequately
characterized?
– It the exposure profile acceptable?
• A baseline exposure assessment (or comprehensive exposure
assessment) requires characterization of the SEG’s exposure
profile.
– An exposure profile is a summary “picture” of the exposure
experienced by an SEG.
• A compliance-based program will focus efforts on exposures near
OELs.
Exposure Acceptability Judgments
• A variety of tools and factors are related to the
judgment of exposure acceptability.
–
–
–
–
–
–
–
–
–
–
–
–
process experience
material characteristics
toxicity knowledge
work force characteristics
frequency of task
frequency of peak excursions
monitoring results
statistical tools
confidence in exposure limit
modeling techniques
biological monitoring
availability and adequacy of engineering controls
Statistical Considerations
• Statistical tools are powerful only if their theoretical bases and
limitations are understood by the person using them.
• Statistical issues must be considered early in the assessment
process. They should be included in the development of the
exposure assessment strategy and when determining a monitoring
strategy.
– Difficulties
• random sampling
• sufficient data
• In spite of their limitations, statistical tools are useful because they
help form a picture of the exposure profile. If their limitations are
understood, they will greatly enhance knowledge of the exposure
profile.
Sample Size Estimation
Approximate Sample Size Requirements to be 95% Confident that
the True Mean Exposure Is Less Than the Long-term Occupational
Exposure Limit (Power = 90%)
Ratio:
ture
mean/OEL
0.75
0.50
0.25
0.10
Low
variability
(GSD=1.5)
25
7
3
2
Sample Size (n)
Moderate
variability
GSD = 2.0 (GSD = 2.5)
82
164
21
41
10
19
6
13
GSD = 3.0
266
67
30
21
High
variability
(GSD = 3.5)
384
96
43
30
Exposure Distribution and Parametric or
Nonparametric Statistical Tools
• A population distribution is a description of the relative
frequencies of the elements of that population.
• Parametric statistics
– The most powerful statistical tools require knowledge or assumptions
about the population’s distribution.
• Nonparametric statistics
– When the underlying distribution of exposure is not known,
nonparametric statistics should be used.
– These statistical tools tend to focus on robust measures such as the
distribution median or other percentile because they are less sensitive
to outliers and spurious data.
– low statistical power and more measurements needed
Common Distribution in Industrial Hygiene
• The random sampling and analytical errors associated with an air
monitoring result is usually presumed to be normally distributed.
• The random fluctuations in exposure from shift to shift or within
shifts tend to be lognormally distributed.
• Exposure fluctuations account for the vast majority of an exposure
profile’s variability (usually more than 85%).
• If we have resources to commit to exposure monitoring, usually the
most efficient approach would call for putting resources into more
measurements rather than into more precise sampling methods.
Distribution Verification
• A logprobability plot is the simplest and most straightforward way
to check data for lognormality.
• The Shapiro and Wilk Test (W-test) is the most rigorous test for
lognormality.
• If the data form a straight line on the lognormality plot, it signifies
that the data follow a lognormal distribution. Then, the line can
be used to estimate the distribution’s geometric mean and
geometric standard deviation.
.995
.9
9
L
o
g
p
r
o
b
a
b
i
l
i
t
y
P
l
o
t
.98
.95
CumlativeProblty
.9
.8
.7
.6
.5
.4
.3
.2
.1
.05
.02
.01
.005
50
100
50
1
0
000
3
C
o
n
c
e
n
t
r
a
t
i
o
n
(
m
g
/
m
)
Poi nt
pl ott
Making a Probability Plot
• Procedures
– Rank order the data, lowest to highest.
– Rank each value from 1 (lowest) to n (highest).
– Calculate the plotting position for each value.
• Plotting position = rank/(n+1)
– Plot the concentrations against the plotting positions.
– Examine a best line through the plotted data.
– Determine whether the data provide a reasonable fit for the straight
line.
– Estimate the distribution GM, GSD and percentiles of interest from
the best-fit line.
W-test for distribution Goodness-of-fit
• W-test is one of the most powerful tests for determining goodnessof-fit for normal or lognormal data when n is fairly small (n 50).
• The W-test is performed as follows:
– Order the data, smallest to largest.
– Calculate k: k =n/2 if n is even; k = (n-1)/2 if n is odd.
– Calculate the W statistic:


a

(
x

x
)
 i n i 1 i 

W   i 1 2
S  (n  1)
k
2
– The data are form a normal (or lognormal if applied to the
logtransformed data) population if W is greater than a certain value.
Sampling Randomly from
Stationary Populations (1)
• Random sampling
– Each element in the population must have equal likelihood of being
observed.
– Practical considerations of travel constraints, weather, process
operation parameters, budgetary limits, and the need to characterize
multiple exposure profiles make statistically randomized sampling
extremely difficult in the real world.
– To avoid known bias:
• If possible, avoid clustering your monitoring into consecutive periods.
• To monitoring different seasons to avoid biases introduced by factors that
change with weather conditions.
• To understand process cycles and avoid biases they might introduce.
• To include both typical and unusual events.
Sampling Randomly from
Stationary Populations (2)
• Autocorrelation
– Autocorrelation occurs when the contaminant concentration in one
time period is related to the concentration in a previous period.
– Clustering all samples in one period when autocorrelation occurs will
result in an underestimate of variability in the exposure profile and an
imprecise estimate of the mean exposure.
– Autocorrelation can also result in underestimating or overestimating
the true degree of exposure depending on whether a high or low
concentration cycle happened to haven been grabbed.
Sampling Randomly from
Stationary Populations (3)
• Stationary population
– Definition of Stationary
• A random process is said to be stationary if its distribution is independent
of the time of observation.
– Stationary population
• An underlying population that does not change during the exposure
assessment period. That is, the mean and variance of this population are
stable over time.
– If the population changes significantly over the random sampling
period, only calculations of sample descriptive statistics and decision
making on the basis of professional judgment are recommended.
– One simple procedure that can help subjectively check for population
stability is to plot the monitoring data chronologically by time of
monitoring. If any trends in the data are apparent, that is a sign the
underlying process is not stationary.
Similar Exposure Interval
• A similar exposure interval is defined as a period in which the
distribution of exposures for a SEG would be expected to be
stationary.
• The measurements needed to characterize the exposure profile
would be taken randomly within a similar exposure interval.
Relationship of Averaging Times
• It is inappropriate to average short-term data with full-shift data.
Short-term data tends to be distributed differently than full-shift
data.
• Mixing of data from different averaging times makes estimates of
variance inaccurate and precludes use of most common statistical
tools.
• Techniques are being developed to predict long-term exposure
profiles based on a time-weighted combination of exposure profiles
for the several short-term tasks. These techniques hold great
promise for providing more detailed characterizations of
exposures and for optimizing sampling efficiency using stratified
random sampling of critical tasks.
Nondetectable Data
• Monitoring result below the analytical limit of detection should not
be discarded.
• Several techniques are available for including below detection
limit data in statistical analysis.
• A factor of 0.7 times the detection limit may be most appropriate
for data with relatively low variability (GSD < 3).
• A factor of 0.5 times the detection limit may be best when the
variability is high (GSD > 3). If more than 50% of data are below
the detection limit then special techniques may be required.
• Probability plot is another way to include data below the detection
limit in the statistical analysis. These plots allow extrapolation of
the data above the detection limit to account for the data below the
detection limit for determination of a reasonable estimate of the
average and variability.
Statistical Techniques
• There is no ideal statistical technique for evaluating industrial
hygiene monitoring data.
• All measurements to be analyzed statistically should be valid in
that:
– They were collected and analyzed using a reasonably accurate and
reasonably unbiased sampling and analytical method.
– They adequately represent personal exposure.
• Descriptive statistics
– arithmetic man, standard deviation, median, range, maximum,
minimum, and fraction of samples over the OEL.
• Inferential statistics
– quantitative estimate of exposure profile
– arithmetic mean and upper tail
– If a decision must be made with a few measurements (for example, 10),
confidence is highest for the estimate of the mean , lower for the
estimate of variance, and lowest for estimate of lower or upper
percentiles.
Focus on the Arithmetic Mean
• For chronic-acting substances, the long-term average exposure
(exposure averaged over weeks or months) is a relevant index of
dose and. Therefore, a useful parameter on which to focus for
evaluating the health risk posed by such an exposure.
• For agents causing Body damping of swings in exposure
• Statistically defined OEL
– definition
• It is an acceptable exposure profile defined by the OEL’s sponsoring
organization.
• It should clearly stated whether:
– The OEL is interpreted as a long-term average (i.e., arithmetic mean of the
distribution of daily average exposures);
– A permissible exceedance of day-to-day exposures (e.g., 5%); or
– A never-to-be-exceeded maximum daily average (e.g., 100% of the daily
average exposures are less than the OEL).
Arithmetic Mean of a Lognormal Distribution
• It is the arithmetic mean, not the geometric mean, of a lognormal
exposure distribution is the best descriptior of average exposure.
• The difference between arithmetic mean and geometric mean of a
lognormal distribution increases when variance in the distribution
increases.
Estimating the Arithmetic Mean of a
Lognormal Distribution
• The recommended method for all sample sizes and GSDs is the
minimum variance unbiased estimate (MVUE).
– Unbiased and minimum variance
• The maximum likelihood estimate (MLE) is easy to calculate and
is less variable than the simple mean for large data sets (N > 50)
and high GSDs.
Confidence Limits Around the Arithmetic
Mean of a Lognormal Distribution
• Confidene limits allow one to gauge the uncertainty in the
parameter estimate. The wider the confidence limits, the less
certain the point estimate.
• Land‘s “exact” procedure is suggested for calculating
confidence limits for arithmetic mean estimates.
Focus on the Upper Tail
• For agents causing acute effects, the average exposure is not as
important as understand how high the exposure may get because
those few high exposure might pose a more important risk to
health than average exposure at lower levels.
• An examination of the exposure profile‘s upper tail will allow an
estimate of the relative frequency with which OEL may be
exceeded.
Estimating Upper Percentiles
• For agents causing acute effects, the average exposure is not as
important as understand how high the exposure may get because
those few high exposure might pose a more important risk to
health than average exposure at lower levels.
• An examination of the exposure profile‘s upper tail will allow an
estimate of the relative frequency with which OEL may be
exceeded.
Tolerance Limits
• To statistically demostrate that no more than a given percentage of
exposures are greater than a standard with some confidence.
– An industrial hygienist can have 95% confidence that no more than
5% of the exposures exceed the standard.
– In effect, an upper one-sided 95% confidence limit on the estimate of
the 95% percentile.
• Advantges:
– Tolerance limits are helpful for defining upper end of an exposure
profile.
– Tolerance limits approach may be appropriate for compliance testing.
• Disadvantages:
– It is very sensitive to sample sizes and the distribution‘s standard
deviation.
How to Choose --- The Mean or the Upper Trial
• In determining compliance with most regulatory and authoritative
OELs that exist today, a focus on the upper tril would be most
appropriate.
• In 1978, OSHA expressed in the preamble to its lead PEL:
– OSHA recongizes that there will be day-to-day variability in airborne
lead exposure experienced by a single employee. The permissible
exposure limit is a maximum allowable value which is not to be
exceeded: hence exposure must be controlled to an averahge value
well below the permissible exposure limit in order to remian in
compliance.
Analysis of Variance to Refine Critical SEGs
• Analysis of variance (ANOVA) is a statistical technique that can be
used to compare the variability of individual workers‘ exposures
with the exposure variability of the overall SEG.
– ANOVA ise used to examine the exposure variability for each
monitored individual (within worker variability) and compare it with
the worker-to-worker variability in the SEG (between-worker
variability).
• This approach can be used to check the homogeneity of the critical
SEGs for which risk of individual misclassification is most severe
and to reassign individuals as necessary.
Examining the Arithmetic Mean: Mean
Estimates and Confidence Intervals
Arithmetic Mean
• Understanding the mean of the exposure
profile may be important when judging
exposure
– Several short term measurements are used
to characterize a daily average.
– Several day-long TWA measurements are
being used to estimate the long-term
average of a day-to-day exposure profile.
Arithmetic Mean
• The best predictor of dose is the exposure
distribution’s arithmetic mean, not the geometric
mean. The general technique is to:
1. Estimate the exposure distribution’s arithmetic mean.
2. Characterize the uncertainty in the arithmetic mean’s
point estimate by calculating confidence limits for the
true mean.
3. Examine the arithmetic mean’s point estimate and true
mean confidence limit(s) in light of an LTA-OEL or
other information to make a judgment on the exposure
profile.
Confidence Intervals
• Upper confidence limit (UCL):
– conservatively protective of worker health, UCL for the
arithmetic mean estimate is emphasized
• UCL1,95% (arithmetic mean’s one-sided 95%
UCL ) < LTA-OEL
– the industrial hygienist would be at least 95 % sure that
the exposure profile’s true mean was below the LTAOEL
• Place all of the statistical power into characterizing
the single boundary most important to the judgment
95% Upper Confidence Interval for
Arithmetic Mean
Arithmetic Mean Point Estimate
95% Upper Confidence Interval for
the Arithmetic Mean
95% certain that the
exposure profile true
mean exposure is
less than this value.
Probability Plotting and
Goodness-of-Fit
• Parametric methods:
– rely on the assumption about the shape of the
underlying population distribution
• Most exposure distributions are right-skewed
and can be reasonably approximated by the
lognormal distribution
– If the probability plotting and goodness-of-fit
techniques verify a lognormal distribution, the tools
for lognormal distributions should be used
Probability Plotting and
Goodness-of-Fit (Cont.)
• If the data do not seem to fit a lognormal
distribution, but they do seem to fit a normal
distribution, the tools for normally distributed
data should be used
• If the data do not seem to fit either the normal
or lognormal distribution
– Whether the SEG has been properly defined
– Whether there has been some systematic change to
the underlying exposure distribution
– Descriptive and nonparametric statistics
Characterizing the
Arithmetic Mean of a Lognormal Distribution
• Easy to calculate but less accurate
– Sample Mean and t-Distribution Confidence Limits
• more variable for large sample size
– Maximum likelihood estimate and Confidence limits
• underestimate variability, too narrow
• Accurate but more difficult to calculate
– Minimum variance unbiased estimate (MVUE)
Point estimate only
– Land’s “Exact” confidence limit  Confidence
limits only
Which To Use  Point Estimate of the
True Mean of the Lognormal Distribution
• If a computer or programmable calculator
is available, the MVUE should be used as
the preferred point estimate of the true
mean of the lognormal distribution
If not
• Sample mean: the GSD is small (<2) or
there are few samples (<15-20)
• MLE: the sample size is large (>15-20)
Which To Use Confidence Limits for the
True Mean of the Lognormal Distribution
• Land’s method:
– exact confidence limits for the true mean
– if a computer is available
• MLE method:
– if a computer is not available
– underestimate the true upper confidence limit
• Easy-to calculate sample mean and tdistribution confidence interval
– Many monitoring results available (>30)
Specific Techniques
Sample Mean and t-Distribution
Confidence Limit
• Sample mean as a point estimate for the exposure
distribution arithmetic mean
– no computer or programmable calculator available
– few samples (<15-20) and a small GSD (<2)
• Simple t-distribution C.I. procedure
– Developed for use with normal distributions
– Also works well for many non-normal distributions
(including the lognormal distribution) when sample sizes
are large (n>30 , GSD<1.5)
• Sample mean and t-distribution method:
– exposure distribution is better characterized by a
normal distribution than a lognormal distribution
Calculation of the Sample Mean and
Confidence Limit
• Step 1:
Calculate the sample mean ( x ) and sample standard
deviation (s)
• Step 2:
Calculate the confidence limits
s
CL  x  t  ( )
n
s
UCL1,95%  x  t  0.95  ( )
n
s
LCL1,95%  x  t  0.95  ( )
• Step 3:
n
Computer the UCL to the LTA-OEL
Maximum Likelihood Estimate and
Confidence Limits for the Arithmetic Mean of
a Lognormal Distribution
• MLE: better point estimate than the sample
mean
– More than 15-20 samples or a high GSD
– Easy to calculate
– Underestimate variability in many cases
– the computed UCL should be interpreted
cautiously because it will often be lower that
the exact UCL
Maximum Likelihood Estimate and
Confidence Limits
• Step 1:
Calculate the mean ( y ) and standard deviation ( s y ) of the
logtransformed data where y=ln(x)
• Step 2:
1 n 1 2
Calculate the MLE: MLE  exp[ y  (
)  sy ]
2 n
• Step 3:
Calculate the UCL and/or LCL for the MLE
CL  exp[ln MLE  t 
• Step 4:
Compare the UCL to the LTA-OEL
sy  n 1
n
]
Minimum Variance Unbiased Estimate of the
Arithmetic Mean of a Lognormal Distribution
• MVUE: the preferred point estimate, used routinely
unless no computer available
• Calculated iteratively
• Calculation using five terms will give results correct
to three significant figures for sample sizes from 5 to
500 and GSDs from 2 to 5
Minimum Variance Unbiased Estimate
Procedures
• Step 1:
Calculate the mean ( y ) and standard deviation ( s y )
of the logtransformed data where y=ln(x)
• Step 2:
Calculate the MVUE
(n  1)
(n  1)3 l 2
MVUE  exp( y )  [1 
l  2
 
n
n (n  1) 2!
(n  1)5 l 3
  ...]
3
n (n  1)(n  3) 3!
where : l 
s y2
2
Land’s “Exact” Estimate of the Arithmetic
Mean Confidence Limits for a Lognormal
Distribution
• Land’s exact method: the most accurate
and least-biased estimate, should be used
whenever possible
• Hewett & Ganser Graphic technique:
– Used for interpolating one of the parameters
needed for the calculation
– Equations to approximate the curves in the
graphs
Land’s “Exact” Estimate
Procedure
• Step 1:
Calculate the mean ( y ) and standard deviation ( s ) of the
y
logtransformed data where y=ln(x)
• Step 2:
Obtain the C-factor for Land’s formula (C(Sy, n, 0.05) for 95% LCL and
C(Sy, n, 0.95) for 95% UCL)
• Step 3:
Calculate the 95% UCL (or 95% LCL)
CL  exp[ln( uˆ )  C 
where :
sy
]
n 1
1 2
ˆ
u  exp( y  s y )
2
• Step 4:
Compare the 95% UCL to the LTA-OEL