Download University of Illinois at Chicago, School of Public Health

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
University of Illinois at Chicago, School of Public Health
Environmental and Occupational Health Sciences Division
Introduction to Environmental Statistics
Module 2: Sampling and Analytical Limitations & Sample Detection Limits
Slide 1
DR. PETER SCHEFF: Hello and welcome to the second lecture on our series on environmental
statistics. This lecture is titled Sampling Analytical Limits and Detection Limits, and we're going
to focus on detection limits and sampling design, and primarily the concept of environmental
uncertainty: Why do things vary from sample to sample and how do we characterize the
variation? There will be data sets we'll be referring to in this lecture on-line at the website. And
we encourage you to download the data sets and look and see how we actually did the analysis.
And as we said in the first lecture, please feel free to e-mail us your questions and your data
sets. We're looking for new examples and feedback on these lectures.
Slide 2
The objectives of this lecture will be to characterize uncertainty in both sampling and analytical
statistics. We're going to describe sampling and analytical limitations with many examples.
In the environment there is natural variability. When we go out and grab a sample of something
to try to characterize the level of chromium, for example, each time we grab a sample we get a
different answer. We get a different answer for many reasons. One reason is strictly related to
the analytical method. That is, there's noise in the method. And every time we sample the same
environment, we get a slightly different answer. That's sampling variability, statistical noise in
the method. There's also variability in the environment; that is concentrations actually changing
with time. So when we sample it tomorrow, it's different than what it actually was today. So we
have to learn -- we have to develop ways to differentiate these two sources of variability and
understand them both to truly understand what we're measuring. In this lecture, we'll be talking
about concepts such as noise, precision, bias, random error, systematic error, and all those kinds
of ideas to develop the notion of the detection limit.
Slide 3
Just as a little brief one-slide review of the first lecture in this series, remember, before you
begin to actually look at numbers, hopefully you've done a proper study design. You've defined
what it is you're interested in studying. You've defined what the specific objectives of your study
design are. And you've then specified and followed the sampling and analytical procedures as
planned.
Slide 4
At this point, now that we're at this lecture, we're going to begin to look at accuracy, precision,
noise, variability, and error. So I want to start by this classic picture that defines accuracy,
precision and bias. Many of you have seen this picture which shows a little target, and the
correct answer is the middle of the target. And what you'd like is your sample analyses, which
are the dots on these four charts, to be right in the middle of the target. So the lower left end of
the four images shows a sampling result which has high precision; that is, all of the answers are
the same and very accurate. They're all centered around the center of the target. But they can
vary -- the accuracy and the precision are sort of independent. So, for example, the upper lefthand image shows the same level of precision but accuracy is fairly poor because the sampling
dots are not in the center of the target. So by definition, accuracy is the ability to measure the
1
true value. We always are striving for accurate methods but sometimes we have to sacrifice
some accuracy for a variety of reasons.
Precision, in contrast, is the ability to measure repeatedly and obtain the same values within a
close range. Precision is based on the amount of noise in the method relative to the level of the
signal. Error is the difference between the true value and the measured value. And bias is
systematic error, where, for example, what's shown in the upper left-hand corner of these four
targets, that analysis, the analytical method there would be bias, and, that is, it always produces
an answer which is systematically different from correct.
Slide 5
So I want to follow this up with a number of definitions that we're going to use throughout this
lecture in defining the detection limit and sampling uncertainty. Two very important concepts are
false positive and false negative.
The false positive concept is when you conclude that a substance is present when it truly is not.
This is sometimes called Type I error. So if we, after the result of a method, say, yeah, we think
chromium is actually here in the sample, it may be that it actually isn't there. We maybe just are
looking at an artifact in the method. That's a false positive result. In contrast to a false negative
result is concluding the substance is not present when it truly is. This is sometimes called Type II
error. Now, when we make decisions based on measurements, based on a sample from the
environment, there's always the possibility that we're going to draw the incorrect conclusion
from our decision. We may decide that the concentration of a particular chemical is above a
certain level. We may be wrong. We may have come to the conclusion that it's below a certain
level. We may be wrong. We need to quantify, and we need to understand the uncertainty in the
measurement process and uncertainty in the environmental variability to be able to make a
probabilistic statement that gives us some level of confidence that we're drawing the right
conclusion.
So as we move forward, we're going to be saying statements like we have a 95% confidence or
99% confidence that we actually think this chemical is present. So we're going to be able to
specify with some level of certainty and qualify our statements. Now, these kinds of
determinations are most commonly done for risk-based programs. Typically, for national ambient
air quality standard like PM, we're looking at a threshold: Is our number above or below? We're
not particularly concerned about making a probabilistic statement. So you'll see this more
commonly in toxics or risk-based programs where we're making a statement about the
probability of being above or below a certain threshold.
Slide 6
Other important definitions are the instrument detection limit. This is the concentration which is
the smallest that can be distinguished from the background by a particular instrument. This is a
laboratory concept, and it applies only to the part of the process that the instrument is
responsible for, that is, the final analytical determination from a sample extraction to a final
number.
The limit of detection is the lowest concentration that can be determined to be statistically
different from blank. And the limit of detection is linked to a confidence limit. For example, EPA
used 99%; that is, you're 99% sure that what you're seeing is actually real or there's a 1%
chance that it's not. The limit of quantification is the level above which quantitative results may
be obtained. The limit of quantification is usually a slightly higher number, and it's a little bit
more stringent, because, in order to be quantitatively certain of our answer, we need to have a
slightly larger signal than when we are qualitatively certain of the presence of a chemical. So the
limit of quantification, usually a little bit higher.
2
Slide 7
The method detection limit is the minimum concentration of a substance in a given matrix that
can be measured with 99% confidence that the analyte is greater than 0. So the method
detection limit is referenced to the material in the environment and its chemical matrix.
If we're talking about air pollution, the matrix is typically air, ambient air, urban air. However,
the chemical matrix for many methods may be water or organic solvent or soil or rock or
whatever media you're sampling. The reporting limit is a number below which the data is not
reported. This is kind of an old concept. Laboratories used to have a stated reporting limit.
They'd say: The concentration is below value X; we're just not going to bother to report it. It's
not a modern concept. And it's not as well defined as detection limit concept, but it may be
present in old reports. So if you run into it, you know there really isn't a good scientific basis, it's
sort of an arbitrary number and hopefully not used in modern methods or modern laboratory
procedures.
Slide 8
The sample matrix is the general, physical and chemical makeup of a particular sample. As I
mentioned, it could be air, water, soil, rock, sand, whatever. And the signal-to-noise ratio now is
a dimensionless measure of the relative strength of an analytical signal to the average strength
of the background instrument noise.
If you look at the signal coming out of an instrument, there is noise with somewhat random
variability. The signal-to-noise ratio is how high the unknown signal rises above that background
noise. We'll show a number of examples of this with some real data. The statistical outlier then is
an observation that appears to deviate markedly from other members of the group of samples,
the populations of samples from which it occurs. Sometimes when we're looking at data, for
example, blanks, one value looks a lot higher than the other values. And it's tempting to say it's
an outlier, we'll push that one aside and not worry too much about it. I want to caution you that
outlier is sort of a dangerous concept. We'll talk a lot more about outliers in future lectures. But I
want you to always approach your data set with the idea that all data is good data and you can't
just simply call this point, you can't just push a point aside because it doesn't look like the other
ones. You need a good reason to do that.
We will have, as I mentioned, future lectures on these kinds of concepts, outliers and censored
data, problems in data sets to give you a little bit more help and guidance.
Slide 9
So let me go and give a rigorous definition of detection limit. The detection limit is the lowest
concentration of an analyte within an environmental matrix that a method or equipment can
detect. It applies to the sampling and the analytical method. The sample detection limit is related
to the amount of a sample collected and the analytical detection limit. So the detection limit will
vary depending upon the volume of air you collect, for example. In contrast, the instrument
detection limit is just the analytical finish. It's the instrument signal-to-noise ratio. If you're
looking at gas chromatography output, for example, you may have to ask yourself the question:
The little blip, is that little signal real or just noise in the system? Sometimes it's a very difficult
determination to make. Hopefully after this lesson you'll have a little bit better idea on how to
make that determination.
Slide 10
So why do we have a problem with any of this? We have a problem and the reason for this
lecture is that there's lots of variability in the environment. Every time you look at the level of
something it's going to be different than from the last time you looked at it. And there's two
3
major reasons why. On the left-hand side of this chart I have summarized all the reasons why
the actual concentration in the environment vary. We'll call that statistical sampling variability.
And on the right-hand side, the boxes show measurement variability, the reason why things vary
is because of the analytical method. So we'll start with the analytical method side of this chart
and finish the lecture on the other side. There are many reasons why there's measurement
variability. We can take different samples, different sample volumes, different sample extraction
procedures. We have different preservation techniques. There are storage and transport issues.
There are different ways of preparing a sample, extracting the sample, and analyzing samples.
So all of these steps along the sampling process can lead to noise or variability.
Slide 11
Summarizing this mathematically, the total uncertainty then is the sum of two variances: The
variances due to variability in the environment; we call that population variance. And the
variance due to the method, or the uncertainty in the analytical determination, that's the
variance of the analytical method.
The square root of the variance is our standard deviation. So S sub P is the uncertainty in the
population or the standard deviation of the population. S sub M is uncertainty due to the
measurement of variability. Variances add, standard deviations do not. So if you wanted a
pooled standard deviation, you would have to add the variances and take the square root. So
when I do this in examples in the Excel spreadsheet you'll be able to download, you'll see that
we're taking root-mean-square errors of standard deviation, which is really just averaging the
variance. How do we measure noise? How do we measure and characterize bias and precision
and analytical method? We do this as part of the quality assurance project plan, or the QAPP.
Slide 12
So these kinds of samples should be built into your study design. You've already collected these
hopefully before you get to the analysis. So field blanks, laboratory blanks, split samples in the
field, split samples in the laboratory, replicate samples in the field and spiked samples are all the
kinds of things you need to fully understand measurement noise and characterize what the true
detection limit is.
Slide 13
So I'm going to next define a number of statistical terms to help us get to this concept of
precision, bias, and error. The first and the most important one probably for this lesson is the
standard deviation. The standard deviation measures the dispersion of a sample distribution, and
it has units that are the same as the units of the measurement. So if you're measuring PM 2.5 in
micrograms per cubic meter, the standard deviation of those measurements also has the same
units, micrograms per cubic meter. It's a scaled number. It's a number of dispersion of a series
of measurements or a series of values. And it's defined as the deviation of each individual value
from its mean, squared and summed over all values divided by N minus 1 and by taking the
square root of that value. So it's the root-mean-square average deviation from the mean. And
it's a measure, a scaled measure of the spread. Also turns out to be very easily calculated in
Excel or scientific calculators. If you have a scientific calculator, it has a standard deviation
function built into it, as does Excel.
Slide 14
In this very simple example, we have eight blank filters, labeled IOM. That's the particular kind
of sampler we use. These filters happen to be teflon. And we had an aluminum determination
made on each these blanks. We have eight answers, and the standard deviation of these eight
blank filters is simply, in Excel, the STDEV function, the standard deviation function, where you
just point to the beginning or end of the column of numbers and it returns the value 43.5.
4
Similarly, you can get the mean. These are relatively easy values to calculate and they describe
the spread of the data.
Slide 15
For looking at precision, if you have only two numbers, not a series of numbers, the thing to
calculate would be the relative percent deviation. The relative percent difference is simply the
difference divided by the mean divided by two or, I'm sorry, the mean is the sum divided by two.
The relative percent difference is usually the absolute value. So we don't ever look at negative
values. It's just the largest of the two values minus the smallest of the two values divided by the
mean of the two values times 100. What is good for two measures if you did a split sample?
Well, it depends on the method. I'm asked this question frequently: Reference method for PM
2.5 specifies a precision of 10%. So the relative percent difference hopefully is much less than
10%. For other things, 10% is very stringent.
So if we're looking at organics, aldehydes in air, relative percent difference is 20% is excellent.
And we have an example in the next lecture which shows relative percent differences or relative
standard deviations you should see expect to see for different real data sets.
Slide 16
And calculating this is quite simple. In the case of two of these filters, you just take the largest
value minus the smallest value divided by the mean times 100. The Excel formula is shown here.
So for filter 2507 and 2508, the relative percent difference is 2.7% for these two filters. This is a
very, very small relative percent difference.
Slide 17
Ideally, though, you have more than two values. You have three or more values. With three or
more values, you are able to calculate a relative standard deviation. Relative standard deviation
is just simply the ratio of the standard deviation to the mean. It's a mean normalized measure of
spread. And by multiplying by 100, you turn this fractional value into a percent value. So ideally
you have at least three replicates, and you're able to calculate a relative standard deviation of
the method or the variation in the environment. This applies not only to an analytical method but
it can also be used to describe variation in the environment. It's just a convenient statistical
definition.
Slide 18
So here's a nice example. This is an example of chromium determined on eight blank filters.
These are Teflon filters. You can see the chromium values are in the neighborhood of 280 or up
to about 380 nanograms in this case. You take those eight values and you calculate the mean
value of 314. The standard deviation of these eight values is 36.5. And the ratio, standard
deviation divided by the mean, in the Excel formula is quite simple: It gives you a relative
standard deviation of 11.6%. It's a relatively low number. It says that the spread around the
mean is relatively tight.
Slide 19
Now, you can estimate this. This is a little trick which is in the PM 2.5 method. If you only have
two values, you can actually estimate the relative standard deviation by taking the relative
percent difference divided by the square root of two. So it's better to have more than two
replicates, but you can still estimate the relative standard deviation by simply dividing the
relative percent difference by the square root of two.
5
Slide 20
Okay. So let's get back to the US EPA's definition of detection limit. By definition, the minimum
detection limit is the minimum concentration of a substance that can be measured and reported
with a 99% confidence that the analyte concentration is greater than 0 in the matrix tested. So
we're always looking at signal inside a chemical matrix. So one of the things we have to do
obviously is define the chemical matrix, define what that contributes to the signal so we can look
at the part of the signal which is the concentration in the environment. This is a statistical
concept. It's not a chemical concept. The detection limit is based on a confidence. In this case, it
is 99% confidence. I would point out, this is a very conservative definition. 99% is a fairly high
threshold. In many cases we adopt a threshold of 95% confidence. So you need to think this
through and build this into your study plan, what level of confidence that you want to accept or
live with when you're actually looking at your numbers. It can make a very large difference in
interpreting your results.
Slide 21
So mathematically what does this look like? Well, if we have the EPA procedures for methods,
you typically specify a minimum of seven aliquots of a sample to be analyzed in a particular
solution. So to determine a detection limit, you have to take seven analyses, seven samples,
seven aliquots of a particular environmental matrix. And you calculate a mean and a standard
deviation of these seven determinations. With that standard deviation of these seven
determinations, you can then calculate the method of detection limit is the standard deviation
times the T value confidence of 99%, with N minus one degrees of freedom. So for seven
aliquots of a particular analyte, you would use six degrees of freedom, and your alpha is 1%, or
0.99, so you look at the T table for 99% confidence, but because of the way T tables are
tabulated, I'll show this by example in the next slide, you need an alpha of 0.02, 1% of the
uncertainty in the upper right-hand tail, 1% uncertainty in the lower left-hand tail. This returns a
T value in Excel, if you look this up, of 3.124. So the T inverse function in Excel with six degrees
of freedom and alpha of, or an probability of a 99% and alpha of 0.02 returns a value of 3.124.
Slide 22
And this slide just shows sort of the way the normal distribution looks. The total area under the
curve in these tables or functions is 1. When you are looking up a T value, you're looking up the
distance, the T value from the center to the right-hand tail. And so if we want to put 1% of the
error in the right-hand tail, the area under the curve is calculated based on 1% of the area on
both tails. And so 1% or 99% confidence means the area under the curve is 98%, the area in the
center. That's the way the Excel function, like most stat functions, work.
Slide 23
In some cases, if you look at the normal distribution, the function is the cumulative normal for
minus infinity up to that point of the distribution, like this curve is shown on the top here. So you
wouldn't have to allocate the error between the two tails. It's just the way that functions are
tabulated. And we'll show you in examples how to use these functions.
Slide 24
So let's actually calculate some detection limits based on the EPA formula. The EPA says you
must have at least seven aliquots. So for my example, for chrome on these Teflon filters, we
have eight filters.
Based on these eight filters I'm able to compute a method detection limit. I do this by taking the
eight filters. These are all blanks. They all should ideally give you the exact same answer. I
compute the mean value of these eight filters. As I said before, it was 314 nanograms. But what
I'm really interested in is the standard deviation of these eight filters. And the standard deviation
of these eight replicates is 36.5. That's a measure of the spread of the reproducibility of
6
measuring the blank. The T value for eight samples, which is seven degrees of freedom, and 1%
error, 0.98 under the bulk of the curve, is 2.998, or 3. Just keep in the back of your mind 99%
confidence typically gets a T value of around 3. So this coming out at 2.998 probably suggests
that you've looked the value up correctly.
So method of detection limit is then the standard deviation times T, or 36.5 times 2.998, which
is 109.6. What this says, if you want to have 99% confidence that the chrome you're measuring
in the environment is real, it must be 109 nanograms above the blank corrected value, above 0.
Slide 25
So we're going to go ahead and look at a number of examples. I'm going to use in my examples
chrome data and titanium data. These were samples collected on Teflon filters, open-faced IOM
samplers, so these have total mass of particles. And in this particular study we had eight field
blanks which were filters that we took to the field, did the sampling, and brought back. So they
represent the entire background noise of the whole process. We also had on three consecutive,
on five consecutive Mondays, the three locations, 15 samples. So we have 15 samples, eight
blanks. This is kind of a unique data set which makes an excellent example here, because each
filter, individual sample and blank, was extracted and analyzed four times by ICPMS. So we're
able to report an average mass on each filter, an average mass on each blank, as well as a
standard deviation of the mass on each filter and the standard deviation of the blank, the mass
on each blank. This gives us a way to measure noise individually as well as in a pooled sense.
Slide 26
So what do these filters look like? Here are the eight chromium blanks. You see for each filter
there's a value, an average value. So for the first filter, the mass of chromium on that filter was
306 nanograms, and the standard deviation of that mass or the noise on that filter was 24.8
nanograms. So we have sort of an average mass, 314. And a standard deviation or a spread on
these eight replicates of 36.5, but I also have a measure of the noise for each filter and a pooled
average, a root-mean-square average of the noise on those eight filters shown down here as
25.5. If you open our Excel spreadsheets you can see how those are calculated.
Slide 27
And then the following slide shows the 15 samples. And these 15 samples in the field are
identified by weeks one through five and sampling locations A, B, C. These are just different
locations within this particular industrial facility. For each one of these samples, we have an
average chromium mass noise or error standard deviation of that average, as well as the
average of the 15 which is 429. Standard deviation of these samples, the spread of the
variabilities in the environment, how much they vary in the environment, 116. And it's sort of a
root-mean-square noise number of 37.
Slide 28
The next two slides just show the titanium results. Here are the eight blanks for titanium. You'll
notice right away that the filter blank is a much lower value. The average mass on these blanks
is only 2.7 nanograms, an extremely small amount. The standard deviation, the spread of this is
about 4.1. So standard deviation is somewhat larger than the mean value. And the analytical
noise, the root mean square noise is 0.8. This is a very small number.
Slide 29
Okay. The 15 titanium samples shown on the next slide show that even with a very, very low
blank value, the sample masses are very high. So you see the titanium average concentration in
these particular 15 samples was about 341 nanograms. The distribution of the standard
deviation of those 15 samples was 171. The noise is much smaller, being 31.9 nanograms.
7
Slide 30
So let's start by looking at signal-to-noise ratio. As I defined before, this is a dimensionless
measure of the relative strength of the analytical signal to the average strength of the
background instrument noise. We calculate this as the ratio of the mean mass to the error on
each filter. We compare this to the distributions for signal noise ratios for both chrome and
titanium. This data set, because of the way it was collected and analyzed, is kind of unique in
that I'm able to calculate a signal-to-noise ratio for each individual filter. And ideally, you know,
the noise is very much less than the signal.
Slide 31
Let’s start by looking for chrome. I've added another column to our spreadsheet which shows the
signal-to-noise ratio. You can look at the chrome mass, the chrome noise and the ratio of those
two in the column on the right. And it shows a relatively high, actually a very high signal-tonoise ratio, 12.3. So on these blank filters, the signal is clearly visible above the analytical noise,
an order of magnitude above the analytical noise. And it's very reproducible. They all have the
same signal-to-noise ratio. So this looks like no problem at all.
Slide 32
In contrast, the titanium filters show a much lower signal-to-noise ratio. They range from about
0.7 up to 6, with a mean of about 3. But you can't conclude at this point that the chrome
measurement is better than a titanium measurement because the measurement is not based on
just signal-to-noise ratio, but the amount of blank in the sample matrix.
Slide 33
So if we look at the results of this analysis, we see that chrome has a very high signal-to-noise
ratio, which was about 12.4. But it also had a very high average blank value, 314 nanograms. In
contrast, titanium had a much lower signal-to-noise ratio, about 3, with a much lower blank ratio
of about 2.7. So it's possible in this analysis that this very high blank may cause a problem with
the detection limit, even though the instrument clearly has no problem measuring chromium in
any of the samples.
Slide 34
So let's apply this measurement of repeated blank filters to the actual samples by using the
detection limit. So with the EPA's definition, we're going to take a 99% or 2.998 T value above
the blank as our detection limit. So we take our eight chrome values shown here, compute the
standard deviation of these eight replicates; look up the T value, 7 degrees of freedom, 0.99 or
99% confidence. This gives us our value as I showed before of 2.998. So, the detection limit is
the standard deviation times 2.998 or 109.66 nanograms.
Slide 35
Now to look at the unknown samples, we have to compare this method detection limit to the
blank corrected chromium samples. So in the following table we're going to compare the method
or blank corrected chromium values to the method detection limit. And this comparison shows a
potential for a serious problem. The chromium method detection limit was 109. The average
blank corrected value on the chromium samples was only 115. So the average value is only
slightly higher than the blank. And in fact six of the 15 samples were below our method
detection limit.
8
Slide 36
In this little spreadsheet you can see what I've added to the right-hand side of our chromium
data is another column, which is the blank corrected value. So this is after we remove the
chemical matrix; that is, the blank filter, this is the amount of chromium that we believe is
contributed by that air sample, by the particles in the air sample. And you see that six of these
values are below the detection limit or less than 109 nanograms. In fact, three of them are even
negative. Three of them had less mass on the filter with the particles than the average of the
blank. So our chromium data is right near the detection limit. Some of it is below the detection
limit, and it's difficult to differentiate the chromium data from the blank.
Slide 37
The titanium picture is completely different. Here for titanium, you see that the detection limit,
which is a standard deviation of our replicates times the T value, is only 12.6 nanograms.
Slide 38
And when we apply this to the titanium samples, we see that the blank corrected values are
orders of magnitude higher than our detection limit of 12.6 nanograms.
Slide 39
So, on our titanium table you see that all of the data clearly rises above the detection limit and
we have no difficulty identifying in all samples a valid titanium concentration with better than a
99% confidence.
Slide 40
Now I want to fast forward to the third lecture and show this data as distributions, because I
think distribution plotting is a really nice way of demonstrating what I've been trying to show in
my Excel tables. On this graph, I'm showing the distribution of the sample filters and the blanks,
the sample filters being the line made up of the diamonds. The blanks, the lines slightly below it
made up of the little squares. These are log probability plots made in Excel, and I will spend
considerable time in the next lecture teaching you how to do this yourself. You don't have to rely
on me, but these show that the distribution of the filters is very close to the distribution of the
blanks. In fact, distribution of the blanks runs into the lower tail of the distribution of the filters.
Now, the X-axis in this graph is probability. But I can't figure out how to make Excel actually put
probabilities down.
But what Excel does is it gives you the Z scores from the standard normal distribution. So just
remember from your statistical background a Z of zero is 50%, Z of 1 is 84%, Z of 2 is 97 and a
half percent, et cetera. So you can imagine, you can replace those numbers, those Z scores with
percentages if you prefer. But again, a lot more of this next week.
Slide 41
But what I want to show by example is the distribution of chrome compared to the distribution of
titanium. You can see for titanium, these are both log probability plots, but the distribution of the
blanks is so much lower than the distribution of the samples that that's clearly going to be no
problem in interpreting any of those concentrations. I like log probability plots. They're very
helpful in displaying and showing information, and I wanted to show you a couple of examples
here.
END OF PART 1
9
Slide 42
Now, I want to look at the other side of that image, that diagram we showed earlier in the
lecture. I want to look at what happens in the environment, and sampling variability in the
environment, and contrast that to what we just finished talking about, which was variability in
the analytical method which helps us determine what the detection limit is.
Now, what is the purpose of your study? The purpose of the study is to estimate some value
that's out there, some true environmental value. It may be a mean. It may be a 98th percentile,
whatever it is, we're trying to estimate it, because we don't ever know what the true value is
because we rarely, if ever, have a complete sample. We have a small sample taken of the large
population.
Slide 43
And that small sample is trying to define the uncertainty in that large population of values. So
we're trying to estimate the true population value from a sample estimate and the true
population uncertainty from a sample estimate. I've shown you how to do that for the
measurement uncertainty, by replicate samples and the T table and the standard deviation.
Slide 44
Now we're going to look at the uncertainty in the environment. Why is there uncertainty in the
environment? There's a lot of variability in the environment or uncertainty, for many reasons.
And your study design is hopefully sufficient to characterize what this uncertainty is.
Concentrations vary for a lot of reasons.
And these next couple of slides show some of the reasons. We have variation due to the location,
distance, direction, elevation relative to a particular source. So as we move around from location
to location relative to a major air pollution source we'll have different concentrations. We may
get a non-uniform distribution because of topography or hydrogeology or meteorology, or any
other kind of biological, physical or chemical distribution mechanisms which is going to disperse
our pollutants unevenly. There may be variability in species as we look at across different heavy
metals or organics or in chemicals like nitrates or sulfates we may see differences.
Slide 45
There may be variation in just the background over time. Some of what we measure is
background in the environment, sort of long averages from distant locations. That may vary as
well. Local emission sources may vary. If you're measuring on a Sunday compared to a
Wednesday, traffic is going to be quite different. There may be a problem at a local source.
There may be a process upset or an accident. And under those kinds of conditions we sometimes
see very high levels in the environment.
Even the averaging time of your sample, that is, if you take a one-hour sample, a one-day
sample, a one-month sample, is going to determine the variability in the ultimate answer. And
finally, even calibration will contribute. So all of these factors together, every time you draw a
sample for one of a whole variety of reasons, you're going to get a different answer.
Slide 46
So it's our job to characterize what that uncertainty is. And the example I want to use is a data
set we collected here in Chicago on PM 10. This was a data set that's used for national ambient
air quality standards violation decision, but it's a nice way for me to illustrate some of these
concepts.
10
Slide 47
So here we have a PM 10 sample, a distribution of PM 10 values in Chicago, for a single monitor.
And at this monitor it's kind of nice because I have a full three-year record of PM 10 data.
So I have one measurement every day for a full three years for 1096 values with no missing
values. So I'm able to a little bit artificially define the whole population. Every single one of those
samples with a three-year period with 100 percent data capture. And I'm going to take this
population of values and I'm going to sample it three different ways.
Slide 48
I'll sample it once every 12th day, sample it once every 6th day, and sample it once every three
days.
Each one of these samples will be a way of estimating the true value, they will give us a point
estimate of the true value that of all 1096 values. Now I've chosen 3, 6 and 12 because those
are the sampling frequencies we use. We all sample our PM networks on a once every six or once
every three-day basis.
Nationally, some of our very expensive monitoring programs, operate on a once every 12th day
basis, because we can't afford to collect all the samples you get in a more frequent sampling
structure. I'm able to take this long series of values, sample it 12 different ways for a 12-day
sampling frequency, six different ways for a six-day sampling frequency and three different ways
for a three-day sampling frequency to estimate the underlying true value. From this I can
demonstrate the kinds of uncertainty you get as a result of sampling design. This table shows
the results, and it's a little bit busy. But across the top of the table are the statistical parameters
I need to calculate. The mean, the minimum, the maximum, the standard deviation, the
standard error of the mean and the confidence interval.
Slide 49
I want to first define what those are and then I'll come back to this table and we'll talk about the
numbers in the table. So what is the population mean? The population mean is the average sum
over all N values of the measurement divided by capital N total number of values in the
population. This is rarely known. In my somewhat restrictive artificial situation, I'm defining the
population as a thousand ninety-six samples. But in the real world you really don't know what
this is, or you rarely know what this is.
Slide 50
And the population variance is defined as the -- it's sort of the sum of the squared deviations of
all the individual measurements on the population mean. Again, we usually don't know what this
is. We estimate the mean and the variance from the sample mean and the sample standard
deviation.
Slide 51
So this slide shows the sample mean, the sum of the values over N, and N is the number of
samples in our little group that we're looking at.
Slide 52
And the sample standard deviation that I defined before is the square root of the sum of the
square deviation from the mean. Sample variance is defined here. Sample standard deviation is
the square root of this value.
11
Slide 53
How about the sample mean? The sample standard deviation describes the spread of the values
within our sample. But the mean itself has a standard deviation.
Slide 54
And we call that the standard error of the mean. And that's simply the variance divided by N. So
the variance of the mean that we're estimating is the variance of the sample divided by N, or the
standard deviation of the mean is the square root of this value, the square root of the sample
variance divided by N. Now, also note in this figure is the value F. F is the finite population
correction factor, because we're sampling without replacement. If I were drawing these samples
from my thousand ninety-six possible values and every time I took a sample out I was able to
put that one back in the population, F would be zero.
And in the real word, F is usually approximately zero because N, capital N, is much much greater
than little n size of your sample. But in this reduced example I can't assume that little n over
capital N is close to zero, so I have to compute it. Not a big deal. Normally it's not a problem.
And so as I mentioned the standard error of the mean is the square root of the variance divided
by N shown here.
Slide 55
And, finally, the confidence interval around the mean is the range where I, making a probabilistic
estimate that my estimated mean is somewhere within a lower limit and an upper limit with a
certain level of confidence. This equation shows it for the standard normal table, and it assumes
that you know what the population variance is.
Slide 56
Since we typically don't know the population variance, we end up using the T distribution. And
the T distribution shown in this slide is based on, again, N minus one degrees, alpha divided by
two confidence, but it allows us to use our sample estimate of the standard deviation S in
computing the confidence interval. So using this equation I will not only take my estimate of the
mean, X bar, but I'll compute the upper and lower confidence, which is my way of saying I have
a certain degree of confidence, I'll use 95% that the true mean is somewhere in this range.
Slide 57
So that's how I computed the values in this table. So you see the number of samples in each of
my subgroups.
I have once every three day sampling starting on the first, second or third day of the series,
once every six day sampling starting on either the first through the sixth day, and my once every
12 day sampling starting on either the first through the 12th day and the number of samples,
complete 1,096, no missing data. Now the next two columns show the minimum and maximum
that you get from that sample. The true minimum is 6.2. True maximum is 115.5. But most of
these samples don't have the true minimum and true maximum. They miss it. The actual grand
average or the population average is 32.4 micrograms per cubic meter and the population
standard deviation is 17.0. Standard error of the mean is zero because I know the population
values, so the confidence interval is zero. Then each line below that shows the estimated mean,
the estimated standard error, and the confidence intervals of the mean for the different samples.
As you look down the table, you see that as sampling frequency decreases, if I sample less
frequently, my estimated standard error, or my ability to estimate the mean, goes up or is less
precise. So the confidence interval then increases accordingly. So this table actually shows, for
example, for my last sample, the last line on this table, my once every 12-day sampling starting
12
on the 12th day of the series, those 91 samples had a mean of 30.75. So I was about two
micrograms below by chance. The standard error of the mean was 1.6, and my 95% confidence
interval was between 27.6 and 33.9. I know where the mean was. I'm pretty sure the mean was
in that range, but I don't know exactly where it was from this once every 12-day sample.
Slide 58
To make it a little bit easier to see the result, I've graphed them. I first graph the minimum,
maximum, and the average PM 10s from these different samples.
What you see here is that the central tendency measure, the mean moves a little bit as my
sampling rates increases. But in general, if you want to estimate something near the tail of the
distribution, like the maximum or something close to the maximum, it's a lot more uncertainty in
that estimate as you increase the sampling frequency or decrease the sampling frequency. So
my once every 12th day sampling frequencies have a wide range for estimates of the maximum,
whereas estimates of the mean are not too bad. If your ultimate goal is estimating the mean,
you might be able to do a pretty decent job with a once every 12-day sample. If you want to
estimate the tail of this distribution, you'll probably miss it with the once every 12-day sample.
Slide 59
The next slide shows the confidence interval. It shows what the 95% confidence interval is
around the mean is. It shows as sampling frequency decreases, I go to once every three, six,
12-day samples, these confidence intervals get larger. And so if your stated goal is to get the
mean value within a very narrow range of uncertainty, then you clearly need to have a very
frequent or almost everyday sample to do that. If you're willing to accept up to five, ten, 15,
20% error in the mean, then a less frequent sample is sufficient. We'll get back to this at the end
of the lecture when I talk about estimating the number of samples required for a particular study
design, because it reproduces the same data quite nicely.
Slide 60
Finally, I'm showing you the results as box plots. So this first box plot is the full data set, the full
population of values.
Slide 61
And then the first of the box plots would show the once in a three-day sampling; and it shows
you the mean value in the center of the notch is pretty reproducible. And the size of the notch is
95% of the confidence interval of the mean, and the box is the inner quartile range.
I will define this in much more detail in subsequent lectures. But it shows that the distributions
look about the same. And they look very similar to the everyday distribution.
Slide 62
When I go to the every sixth day samples, the distributions are beginning to show a little more
variability. Interquartile ranges or the boxes themselves are moving around; the notches are
growing, which means my uncertainty in the mean estimates are growing. The extreme values
are being bounced around a little bit more.
Slide 63
And distributions demonstrated from the once every 12-day sampling show a great deal of
variability. So especially in extreme values some of these samples are very different than others.
And if your end point is an extreme value, this would be a very inefficient way to get a good
estimate looking at once every 12-day sampling.
13
Slide 64
So I just wanted to show that sampling frequency leads to error in our ability to estimate a
parameter. In this case, a sampling mean. The more frequent we sample, the less error. The less
frequent we sample, the more error. You want to select your sampling frequency specifically to
meet the desired precision of your sampling protocol.
Slide 65
I need to speak a little bit about sampling designs, then we can wrap this lecture up by showing
you how to actually specify this uncertainty in your study design. So what are the basics of
sampling designs that are available to us? We can look at a number of study designs. And
they're characterized either as haphazard sampling, judgment sampling or probability sampling.
Haphazard sampling, or completely random sampling, requires a homogeneous population over
space and time, if you want to get an unbiased assessment. If we're looking at how people in the
United States are being exposed to air pollutants, this is a very inefficient way to sample,
because we don't live homogenously distributed across the United States. We live in cities. If we
randomly sampled locations in the country we would miss the population exposure. We'd have to
take many, many samples. Judgment sampling is what we tend to use, or we certainly include
judgment sampling because we know where people live so we specifically go there to do our
sampling. Once we get to where we're going to sample, we'll use some kind of probability
sampling method, some kind of systematic or random sampling method which allows us to have
an efficient sampling design.
Slide 66
So what are these designs? Simple random sampling, the most basic, doesn't work well if the
population contains patterns. Stratified random sampling, which is really useful where there's a
pattern so we can divide into different strata. Within a strata, we think it's much more
homogenous. We'll divide our city into urban, suburban, rural background. Within each of these
strata, we'll be able to make the assumption that the concentrations are much more
homogenous. So we'll separate off the industrial areas from the city from the suburban areas of
the city. Multi-stage sampling is when we're not exactly sure, so we go in there, take some
samples, look at our results and go back and adjust for subsequent sampling. And it's potentially
useful if you really don't have a particularly good view of what you're going to find.
Slide 67
So we're going to ultimately, and typically use some kind of systematic sampling, it's the method
of choice when there's a pattern or trend.
As I mentioned before, we typically sample particles once every sixth day across the United
States. And everybody does it on the same day. So it's quite uniform. If there's a strong linear
pattern we may want a double sample or we may need to do search sampling for other particular
kinds of issues, but we're normally looking at sort of a systematic sample over time or space.
Slide 68
This can be demonstrated with these simple charts. Random sampling along on the line. Not very
efficient if we have non-homogeneous distributions. Stratified random sampling, much more
efficient if we can define strata. It's not too hard with air pollution. We know an urban area is
different from rural area. So we can define strata pretty straightforward. Within each strata we
may do cluster sampling; that is, I may do a lot of sampling in my heavily contaminated urban
areas and do much less sampling in less contaminated rural areas, just because the relative
areas are so different. I have to clearly over sample where people live.
14
Slide 69
But ultimately what we do within a strata, within a particular sampling time, is some kind of a
systematic sample as demonstrated in the middle slide where we look at samples collected that
are evenly spaced at a particular location overtime. We could also do this at a particular location
over space and so we, if we have to subdivide, deposit of something in soil, we may draw a line,
step along that line with equally spaced sampling intervals. The same idea: Space or time.
Slide 70
And in two dimensions these graphs translate up with simple random sampling in a twodimensional space versus stratified random sampling, which is much more like what we would
end up doing. We would pick our sampling sites within each strata somewhat randomly.
Within each sampling site, we would sample systematically equally spaced samples over time.
Slide 71
If we're looking at an area that we have to sample like a harbor or something like that, we may
lay a grid down over that and then sample systematically over that grid. Cluster sampling shows
here where each of those locations may be. One may be an industrial area. One may be a
suburban area. One may be a rural area for example.
Slide 72
This is a nice little example that my students and colleagues are working on that we're just
sampling sand at a beach for a particular toxic material and we laid down these lines and then
we sampled along these lines at equally spaced intervals. This would be a systematic random
sampling of way to get at the concentration of this toxic material in the beach sand.
Slide 73
This is one of my colleagues sampling. It's a tough job but somebody has got to get out there
and do it.
Slide 74
Now let's get to how we translate this into number of measurements required. So we're going to
define this simple equation here. It's going to help me explain this concept. If you have -- if
you've picked a method, you know what its noise is, you apply this method to an ambient
environment and you want to estimate something like the mean. How close do you want to get
to the true value? You can use the analytical method in the uncertainty in the environment to
calculate the number of samples you require for a specific precision. If the absolute margin of
error that can be tolerated in the measurement of the mean of X is D, so the absolute error D,
and we're going to accept the probability alpha of exceeding that error and we write this
statement: The probability of the mean that we estimate from the true value greater than a
certain deviation D is less than a probability. So we specify D, the distance we want to be away
from the mean. We specify alpha, our confidence; and if we know something about the
distribution of samples in the environment, we can compute the number of samples required.
Slide 75
Now, if we know the population variance, and we have a very, very large sample, then we use
the standard normal curve. The Z table. Typically, we have a situation where the size of our
sample N is very, very much smaller than the total population N available.
15
Slide 76
So this equation reduces to what is shown on this slide. But remember, we usually don't know
what sigma is; we're estimating sigma using a standard deviation.
Slide 77
So we end up with a T distribution shown here, where we use S as our estimate of sigma and the
T distribution and alpha divided by two, probability N minus one degrees of freedom.
Slide 78
And again, if our sample n is kind of small compared to capital N, the total number of possible
samples out there in the environment we could reduce the equation to what's shown on this
slide.
Now, this is a bit of a nuisance to calculate, because T is a function of N. And so what you have
to do is you have to guess a value for N, compute the T, compute the value of N. See how close
you are to your guess, make a judgment and go around this loop a couple of times until you get
the final answer. So it is a trial and error solution, and Excel doesn't like doing this for you, so
you have to do it yourself.
Slide 79
Now one useful concept that helps in understanding this calculation is the coefficient of variation.
If we define the coefficient of variation as the population of uncertainty divided by the population
mean, sigma divided by mu, which we estimate as a standard deviation divided by the sample
mean, then we're able to specify a relative error, D sub R, in terms of this coefficient of
variation. Our relative error now is the difference between our sample estimate and the true
value divided by the true value.
Slide 80
Using this concept of coefficient variation and relative error, or mean normalized error, we're
able to specify the number of samples required. It's a function of the coefficient of variation
divided by this mean relative error.
Slide 81
Here is the case where n is large compared to capital N. But typically we're here where the
number of samples is quite small compared to the population of possible samples out there. And
so from this equation we're able to compute the number of samples we need, given a variation in
the environment, coefficient of variation, and a mean relative error we're willing to accept, D sub
R, and a value off the standard norm table. We're again specifying the standard deviation divided
by the mean which we estimate from our sample.
Slide 82
This table displays the results of this calculation. And there's three things happening in this table.
I want to draw your attention to these three things to help you understand what's in the table.
Along the top, in the right-hand five columns is the coefficient of variation. This is a variation in
the environment. This is the standard deviation divided by the mean in the environment. The
first column, the first of these five is 0.1, is a very tightly varying, very little variation in
environment. Standard deviation is very low compared to the mean. The second column has a
coefficient ratio of 0.5 then 1, 1.5 and 2. Just to draw your attention back to our PM 10 example,
the coefficient of variation of the one thousand ninety-six PM 10 samples was about 0.5. So in
the environment 0.5 for PM 10 in an area is probably a good definition for variation. If something
16
is more variable, you could go to the right a little bit. But just to sort of anchor this discussion,
the PM 10 values were around 0.5. Now, the second column in this table is the relative error. It's
the deviation between your estimate and the true value that you're willing to accept. And so this
is solved for four different relative errors. A 10% deviation is a very stringent relative error. You
want to be within 10% of the correct value. 25%, 50%, 100%, or 200%. It doesn't really matter
that your estimate be that close.
And then, the third parameter in this table is the confidence that you want to make this
judgment. And I've solved this for two different confidences, confidence of 80% and confidence
of 95%. If you want to be 80% certain that you're within a certain amount relative to the mean,
or do you want 95% accurate? So what does this imply? It's kind of nice. It implies, if you want
to be 95% certain that you're within 10% of the true value and the natural variation of the
environment is 0.5, then you need 97 samples, which is exactly what the previous PM 10
sampling table shows us. It showed us that if we took 91 samples out of our 1,096, and we drew
a 95% confidence interval, we got to within about 10% of the mean every time. So whether you
look at PM 10 sampling, my example, I'll just repeat, sampling the same distribution or
computing it on based on the standard normal curve, it's reassuring to know you get the same
answer as you should. So this table is just there to give you a glimpse as to the kind of sampling
designs you need in terms of number of samples to achieve specified accuracy and your ultimate
estimate of a sample mean, given a certain probability that you're going to get there. It's bit of a
complicated table. Look at it a little bit and send us your questions if you still can't figure it out.
Slide 83
So to summarize this last concept: Uncertainty and certainty in an analytical method can be
quantified through specific statistical procedures, and we've shown you a number of examples.
We can use these procedures to actually compute the number of samples that we should collect
with the specified level of uncertainty and specified probability that we're going to get there. We
can specify uncertainty in environmental sampling. We can characterize uncertainty in a
detection limit. And we can use these to understand the quality and quantity of the data that we
need to collect.
Slide 84
So from this lecture, we've talked about statistical definitions of the detection limit. You should
have a pretty good idea of what a detection limit is and how it's defined and how we actually
apply this to real environmental samples. And the context of the detection limit, you also should
have a pretty good understanding of uncertainty in the measurement compared to uncertainty in
the environment itself and how we handle uncertainty in the environment, by sampling, how we
characterize uncertainty in the environment by estimating confidence intervals and standard
errors.
And finally, to wrap things up, we took these concepts, to give you an idea how to estimate the
number of samples that are required to actually meet the measurement objective in your
sampling program. This concludes the second lecture. In the third lecture we're going to come
back and explore in much greater detail issues of quality assurance and look at distributions and
ways of looking at your data to help you understand the quality of what you've done. At the
website where this lecture is posted will be all of our spreadsheets that we've used in these
examples and we encourage you to look at those, because you can see into the spreadsheet the
formulas and how they're defined. And hopefully you'll generate feedback for us and questions.
We're here, ready to go to answer your questions, look at your data and help you with your
specific environmental problems. Thank you. And I'll be back with lecture three whenever you're
ready.
END OF MODULE 2
17