Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
University of Illinois at Chicago, School of Public Health Environmental and Occupational Health Sciences Division Introduction to Environmental Statistics Module 2: Sampling and Analytical Limitations & Sample Detection Limits Slide 1 DR. PETER SCHEFF: Hello and welcome to the second lecture on our series on environmental statistics. This lecture is titled Sampling Analytical Limits and Detection Limits, and we're going to focus on detection limits and sampling design, and primarily the concept of environmental uncertainty: Why do things vary from sample to sample and how do we characterize the variation? There will be data sets we'll be referring to in this lecture on-line at the website. And we encourage you to download the data sets and look and see how we actually did the analysis. And as we said in the first lecture, please feel free to e-mail us your questions and your data sets. We're looking for new examples and feedback on these lectures. Slide 2 The objectives of this lecture will be to characterize uncertainty in both sampling and analytical statistics. We're going to describe sampling and analytical limitations with many examples. In the environment there is natural variability. When we go out and grab a sample of something to try to characterize the level of chromium, for example, each time we grab a sample we get a different answer. We get a different answer for many reasons. One reason is strictly related to the analytical method. That is, there's noise in the method. And every time we sample the same environment, we get a slightly different answer. That's sampling variability, statistical noise in the method. There's also variability in the environment; that is concentrations actually changing with time. So when we sample it tomorrow, it's different than what it actually was today. So we have to learn -- we have to develop ways to differentiate these two sources of variability and understand them both to truly understand what we're measuring. In this lecture, we'll be talking about concepts such as noise, precision, bias, random error, systematic error, and all those kinds of ideas to develop the notion of the detection limit. Slide 3 Just as a little brief one-slide review of the first lecture in this series, remember, before you begin to actually look at numbers, hopefully you've done a proper study design. You've defined what it is you're interested in studying. You've defined what the specific objectives of your study design are. And you've then specified and followed the sampling and analytical procedures as planned. Slide 4 At this point, now that we're at this lecture, we're going to begin to look at accuracy, precision, noise, variability, and error. So I want to start by this classic picture that defines accuracy, precision and bias. Many of you have seen this picture which shows a little target, and the correct answer is the middle of the target. And what you'd like is your sample analyses, which are the dots on these four charts, to be right in the middle of the target. So the lower left end of the four images shows a sampling result which has high precision; that is, all of the answers are the same and very accurate. They're all centered around the center of the target. But they can vary -- the accuracy and the precision are sort of independent. So, for example, the upper lefthand image shows the same level of precision but accuracy is fairly poor because the sampling dots are not in the center of the target. So by definition, accuracy is the ability to measure the 1 true value. We always are striving for accurate methods but sometimes we have to sacrifice some accuracy for a variety of reasons. Precision, in contrast, is the ability to measure repeatedly and obtain the same values within a close range. Precision is based on the amount of noise in the method relative to the level of the signal. Error is the difference between the true value and the measured value. And bias is systematic error, where, for example, what's shown in the upper left-hand corner of these four targets, that analysis, the analytical method there would be bias, and, that is, it always produces an answer which is systematically different from correct. Slide 5 So I want to follow this up with a number of definitions that we're going to use throughout this lecture in defining the detection limit and sampling uncertainty. Two very important concepts are false positive and false negative. The false positive concept is when you conclude that a substance is present when it truly is not. This is sometimes called Type I error. So if we, after the result of a method, say, yeah, we think chromium is actually here in the sample, it may be that it actually isn't there. We maybe just are looking at an artifact in the method. That's a false positive result. In contrast to a false negative result is concluding the substance is not present when it truly is. This is sometimes called Type II error. Now, when we make decisions based on measurements, based on a sample from the environment, there's always the possibility that we're going to draw the incorrect conclusion from our decision. We may decide that the concentration of a particular chemical is above a certain level. We may be wrong. We may have come to the conclusion that it's below a certain level. We may be wrong. We need to quantify, and we need to understand the uncertainty in the measurement process and uncertainty in the environmental variability to be able to make a probabilistic statement that gives us some level of confidence that we're drawing the right conclusion. So as we move forward, we're going to be saying statements like we have a 95% confidence or 99% confidence that we actually think this chemical is present. So we're going to be able to specify with some level of certainty and qualify our statements. Now, these kinds of determinations are most commonly done for risk-based programs. Typically, for national ambient air quality standard like PM, we're looking at a threshold: Is our number above or below? We're not particularly concerned about making a probabilistic statement. So you'll see this more commonly in toxics or risk-based programs where we're making a statement about the probability of being above or below a certain threshold. Slide 6 Other important definitions are the instrument detection limit. This is the concentration which is the smallest that can be distinguished from the background by a particular instrument. This is a laboratory concept, and it applies only to the part of the process that the instrument is responsible for, that is, the final analytical determination from a sample extraction to a final number. The limit of detection is the lowest concentration that can be determined to be statistically different from blank. And the limit of detection is linked to a confidence limit. For example, EPA used 99%; that is, you're 99% sure that what you're seeing is actually real or there's a 1% chance that it's not. The limit of quantification is the level above which quantitative results may be obtained. The limit of quantification is usually a slightly higher number, and it's a little bit more stringent, because, in order to be quantitatively certain of our answer, we need to have a slightly larger signal than when we are qualitatively certain of the presence of a chemical. So the limit of quantification, usually a little bit higher. 2 Slide 7 The method detection limit is the minimum concentration of a substance in a given matrix that can be measured with 99% confidence that the analyte is greater than 0. So the method detection limit is referenced to the material in the environment and its chemical matrix. If we're talking about air pollution, the matrix is typically air, ambient air, urban air. However, the chemical matrix for many methods may be water or organic solvent or soil or rock or whatever media you're sampling. The reporting limit is a number below which the data is not reported. This is kind of an old concept. Laboratories used to have a stated reporting limit. They'd say: The concentration is below value X; we're just not going to bother to report it. It's not a modern concept. And it's not as well defined as detection limit concept, but it may be present in old reports. So if you run into it, you know there really isn't a good scientific basis, it's sort of an arbitrary number and hopefully not used in modern methods or modern laboratory procedures. Slide 8 The sample matrix is the general, physical and chemical makeup of a particular sample. As I mentioned, it could be air, water, soil, rock, sand, whatever. And the signal-to-noise ratio now is a dimensionless measure of the relative strength of an analytical signal to the average strength of the background instrument noise. If you look at the signal coming out of an instrument, there is noise with somewhat random variability. The signal-to-noise ratio is how high the unknown signal rises above that background noise. We'll show a number of examples of this with some real data. The statistical outlier then is an observation that appears to deviate markedly from other members of the group of samples, the populations of samples from which it occurs. Sometimes when we're looking at data, for example, blanks, one value looks a lot higher than the other values. And it's tempting to say it's an outlier, we'll push that one aside and not worry too much about it. I want to caution you that outlier is sort of a dangerous concept. We'll talk a lot more about outliers in future lectures. But I want you to always approach your data set with the idea that all data is good data and you can't just simply call this point, you can't just push a point aside because it doesn't look like the other ones. You need a good reason to do that. We will have, as I mentioned, future lectures on these kinds of concepts, outliers and censored data, problems in data sets to give you a little bit more help and guidance. Slide 9 So let me go and give a rigorous definition of detection limit. The detection limit is the lowest concentration of an analyte within an environmental matrix that a method or equipment can detect. It applies to the sampling and the analytical method. The sample detection limit is related to the amount of a sample collected and the analytical detection limit. So the detection limit will vary depending upon the volume of air you collect, for example. In contrast, the instrument detection limit is just the analytical finish. It's the instrument signal-to-noise ratio. If you're looking at gas chromatography output, for example, you may have to ask yourself the question: The little blip, is that little signal real or just noise in the system? Sometimes it's a very difficult determination to make. Hopefully after this lesson you'll have a little bit better idea on how to make that determination. Slide 10 So why do we have a problem with any of this? We have a problem and the reason for this lecture is that there's lots of variability in the environment. Every time you look at the level of something it's going to be different than from the last time you looked at it. And there's two 3 major reasons why. On the left-hand side of this chart I have summarized all the reasons why the actual concentration in the environment vary. We'll call that statistical sampling variability. And on the right-hand side, the boxes show measurement variability, the reason why things vary is because of the analytical method. So we'll start with the analytical method side of this chart and finish the lecture on the other side. There are many reasons why there's measurement variability. We can take different samples, different sample volumes, different sample extraction procedures. We have different preservation techniques. There are storage and transport issues. There are different ways of preparing a sample, extracting the sample, and analyzing samples. So all of these steps along the sampling process can lead to noise or variability. Slide 11 Summarizing this mathematically, the total uncertainty then is the sum of two variances: The variances due to variability in the environment; we call that population variance. And the variance due to the method, or the uncertainty in the analytical determination, that's the variance of the analytical method. The square root of the variance is our standard deviation. So S sub P is the uncertainty in the population or the standard deviation of the population. S sub M is uncertainty due to the measurement of variability. Variances add, standard deviations do not. So if you wanted a pooled standard deviation, you would have to add the variances and take the square root. So when I do this in examples in the Excel spreadsheet you'll be able to download, you'll see that we're taking root-mean-square errors of standard deviation, which is really just averaging the variance. How do we measure noise? How do we measure and characterize bias and precision and analytical method? We do this as part of the quality assurance project plan, or the QAPP. Slide 12 So these kinds of samples should be built into your study design. You've already collected these hopefully before you get to the analysis. So field blanks, laboratory blanks, split samples in the field, split samples in the laboratory, replicate samples in the field and spiked samples are all the kinds of things you need to fully understand measurement noise and characterize what the true detection limit is. Slide 13 So I'm going to next define a number of statistical terms to help us get to this concept of precision, bias, and error. The first and the most important one probably for this lesson is the standard deviation. The standard deviation measures the dispersion of a sample distribution, and it has units that are the same as the units of the measurement. So if you're measuring PM 2.5 in micrograms per cubic meter, the standard deviation of those measurements also has the same units, micrograms per cubic meter. It's a scaled number. It's a number of dispersion of a series of measurements or a series of values. And it's defined as the deviation of each individual value from its mean, squared and summed over all values divided by N minus 1 and by taking the square root of that value. So it's the root-mean-square average deviation from the mean. And it's a measure, a scaled measure of the spread. Also turns out to be very easily calculated in Excel or scientific calculators. If you have a scientific calculator, it has a standard deviation function built into it, as does Excel. Slide 14 In this very simple example, we have eight blank filters, labeled IOM. That's the particular kind of sampler we use. These filters happen to be teflon. And we had an aluminum determination made on each these blanks. We have eight answers, and the standard deviation of these eight blank filters is simply, in Excel, the STDEV function, the standard deviation function, where you just point to the beginning or end of the column of numbers and it returns the value 43.5. 4 Similarly, you can get the mean. These are relatively easy values to calculate and they describe the spread of the data. Slide 15 For looking at precision, if you have only two numbers, not a series of numbers, the thing to calculate would be the relative percent deviation. The relative percent difference is simply the difference divided by the mean divided by two or, I'm sorry, the mean is the sum divided by two. The relative percent difference is usually the absolute value. So we don't ever look at negative values. It's just the largest of the two values minus the smallest of the two values divided by the mean of the two values times 100. What is good for two measures if you did a split sample? Well, it depends on the method. I'm asked this question frequently: Reference method for PM 2.5 specifies a precision of 10%. So the relative percent difference hopefully is much less than 10%. For other things, 10% is very stringent. So if we're looking at organics, aldehydes in air, relative percent difference is 20% is excellent. And we have an example in the next lecture which shows relative percent differences or relative standard deviations you should see expect to see for different real data sets. Slide 16 And calculating this is quite simple. In the case of two of these filters, you just take the largest value minus the smallest value divided by the mean times 100. The Excel formula is shown here. So for filter 2507 and 2508, the relative percent difference is 2.7% for these two filters. This is a very, very small relative percent difference. Slide 17 Ideally, though, you have more than two values. You have three or more values. With three or more values, you are able to calculate a relative standard deviation. Relative standard deviation is just simply the ratio of the standard deviation to the mean. It's a mean normalized measure of spread. And by multiplying by 100, you turn this fractional value into a percent value. So ideally you have at least three replicates, and you're able to calculate a relative standard deviation of the method or the variation in the environment. This applies not only to an analytical method but it can also be used to describe variation in the environment. It's just a convenient statistical definition. Slide 18 So here's a nice example. This is an example of chromium determined on eight blank filters. These are Teflon filters. You can see the chromium values are in the neighborhood of 280 or up to about 380 nanograms in this case. You take those eight values and you calculate the mean value of 314. The standard deviation of these eight values is 36.5. And the ratio, standard deviation divided by the mean, in the Excel formula is quite simple: It gives you a relative standard deviation of 11.6%. It's a relatively low number. It says that the spread around the mean is relatively tight. Slide 19 Now, you can estimate this. This is a little trick which is in the PM 2.5 method. If you only have two values, you can actually estimate the relative standard deviation by taking the relative percent difference divided by the square root of two. So it's better to have more than two replicates, but you can still estimate the relative standard deviation by simply dividing the relative percent difference by the square root of two. 5 Slide 20 Okay. So let's get back to the US EPA's definition of detection limit. By definition, the minimum detection limit is the minimum concentration of a substance that can be measured and reported with a 99% confidence that the analyte concentration is greater than 0 in the matrix tested. So we're always looking at signal inside a chemical matrix. So one of the things we have to do obviously is define the chemical matrix, define what that contributes to the signal so we can look at the part of the signal which is the concentration in the environment. This is a statistical concept. It's not a chemical concept. The detection limit is based on a confidence. In this case, it is 99% confidence. I would point out, this is a very conservative definition. 99% is a fairly high threshold. In many cases we adopt a threshold of 95% confidence. So you need to think this through and build this into your study plan, what level of confidence that you want to accept or live with when you're actually looking at your numbers. It can make a very large difference in interpreting your results. Slide 21 So mathematically what does this look like? Well, if we have the EPA procedures for methods, you typically specify a minimum of seven aliquots of a sample to be analyzed in a particular solution. So to determine a detection limit, you have to take seven analyses, seven samples, seven aliquots of a particular environmental matrix. And you calculate a mean and a standard deviation of these seven determinations. With that standard deviation of these seven determinations, you can then calculate the method of detection limit is the standard deviation times the T value confidence of 99%, with N minus one degrees of freedom. So for seven aliquots of a particular analyte, you would use six degrees of freedom, and your alpha is 1%, or 0.99, so you look at the T table for 99% confidence, but because of the way T tables are tabulated, I'll show this by example in the next slide, you need an alpha of 0.02, 1% of the uncertainty in the upper right-hand tail, 1% uncertainty in the lower left-hand tail. This returns a T value in Excel, if you look this up, of 3.124. So the T inverse function in Excel with six degrees of freedom and alpha of, or an probability of a 99% and alpha of 0.02 returns a value of 3.124. Slide 22 And this slide just shows sort of the way the normal distribution looks. The total area under the curve in these tables or functions is 1. When you are looking up a T value, you're looking up the distance, the T value from the center to the right-hand tail. And so if we want to put 1% of the error in the right-hand tail, the area under the curve is calculated based on 1% of the area on both tails. And so 1% or 99% confidence means the area under the curve is 98%, the area in the center. That's the way the Excel function, like most stat functions, work. Slide 23 In some cases, if you look at the normal distribution, the function is the cumulative normal for minus infinity up to that point of the distribution, like this curve is shown on the top here. So you wouldn't have to allocate the error between the two tails. It's just the way that functions are tabulated. And we'll show you in examples how to use these functions. Slide 24 So let's actually calculate some detection limits based on the EPA formula. The EPA says you must have at least seven aliquots. So for my example, for chrome on these Teflon filters, we have eight filters. Based on these eight filters I'm able to compute a method detection limit. I do this by taking the eight filters. These are all blanks. They all should ideally give you the exact same answer. I compute the mean value of these eight filters. As I said before, it was 314 nanograms. But what I'm really interested in is the standard deviation of these eight filters. And the standard deviation of these eight replicates is 36.5. That's a measure of the spread of the reproducibility of 6 measuring the blank. The T value for eight samples, which is seven degrees of freedom, and 1% error, 0.98 under the bulk of the curve, is 2.998, or 3. Just keep in the back of your mind 99% confidence typically gets a T value of around 3. So this coming out at 2.998 probably suggests that you've looked the value up correctly. So method of detection limit is then the standard deviation times T, or 36.5 times 2.998, which is 109.6. What this says, if you want to have 99% confidence that the chrome you're measuring in the environment is real, it must be 109 nanograms above the blank corrected value, above 0. Slide 25 So we're going to go ahead and look at a number of examples. I'm going to use in my examples chrome data and titanium data. These were samples collected on Teflon filters, open-faced IOM samplers, so these have total mass of particles. And in this particular study we had eight field blanks which were filters that we took to the field, did the sampling, and brought back. So they represent the entire background noise of the whole process. We also had on three consecutive, on five consecutive Mondays, the three locations, 15 samples. So we have 15 samples, eight blanks. This is kind of a unique data set which makes an excellent example here, because each filter, individual sample and blank, was extracted and analyzed four times by ICPMS. So we're able to report an average mass on each filter, an average mass on each blank, as well as a standard deviation of the mass on each filter and the standard deviation of the blank, the mass on each blank. This gives us a way to measure noise individually as well as in a pooled sense. Slide 26 So what do these filters look like? Here are the eight chromium blanks. You see for each filter there's a value, an average value. So for the first filter, the mass of chromium on that filter was 306 nanograms, and the standard deviation of that mass or the noise on that filter was 24.8 nanograms. So we have sort of an average mass, 314. And a standard deviation or a spread on these eight replicates of 36.5, but I also have a measure of the noise for each filter and a pooled average, a root-mean-square average of the noise on those eight filters shown down here as 25.5. If you open our Excel spreadsheets you can see how those are calculated. Slide 27 And then the following slide shows the 15 samples. And these 15 samples in the field are identified by weeks one through five and sampling locations A, B, C. These are just different locations within this particular industrial facility. For each one of these samples, we have an average chromium mass noise or error standard deviation of that average, as well as the average of the 15 which is 429. Standard deviation of these samples, the spread of the variabilities in the environment, how much they vary in the environment, 116. And it's sort of a root-mean-square noise number of 37. Slide 28 The next two slides just show the titanium results. Here are the eight blanks for titanium. You'll notice right away that the filter blank is a much lower value. The average mass on these blanks is only 2.7 nanograms, an extremely small amount. The standard deviation, the spread of this is about 4.1. So standard deviation is somewhat larger than the mean value. And the analytical noise, the root mean square noise is 0.8. This is a very small number. Slide 29 Okay. The 15 titanium samples shown on the next slide show that even with a very, very low blank value, the sample masses are very high. So you see the titanium average concentration in these particular 15 samples was about 341 nanograms. The distribution of the standard deviation of those 15 samples was 171. The noise is much smaller, being 31.9 nanograms. 7 Slide 30 So let's start by looking at signal-to-noise ratio. As I defined before, this is a dimensionless measure of the relative strength of the analytical signal to the average strength of the background instrument noise. We calculate this as the ratio of the mean mass to the error on each filter. We compare this to the distributions for signal noise ratios for both chrome and titanium. This data set, because of the way it was collected and analyzed, is kind of unique in that I'm able to calculate a signal-to-noise ratio for each individual filter. And ideally, you know, the noise is very much less than the signal. Slide 31 Let’s start by looking for chrome. I've added another column to our spreadsheet which shows the signal-to-noise ratio. You can look at the chrome mass, the chrome noise and the ratio of those two in the column on the right. And it shows a relatively high, actually a very high signal-tonoise ratio, 12.3. So on these blank filters, the signal is clearly visible above the analytical noise, an order of magnitude above the analytical noise. And it's very reproducible. They all have the same signal-to-noise ratio. So this looks like no problem at all. Slide 32 In contrast, the titanium filters show a much lower signal-to-noise ratio. They range from about 0.7 up to 6, with a mean of about 3. But you can't conclude at this point that the chrome measurement is better than a titanium measurement because the measurement is not based on just signal-to-noise ratio, but the amount of blank in the sample matrix. Slide 33 So if we look at the results of this analysis, we see that chrome has a very high signal-to-noise ratio, which was about 12.4. But it also had a very high average blank value, 314 nanograms. In contrast, titanium had a much lower signal-to-noise ratio, about 3, with a much lower blank ratio of about 2.7. So it's possible in this analysis that this very high blank may cause a problem with the detection limit, even though the instrument clearly has no problem measuring chromium in any of the samples. Slide 34 So let's apply this measurement of repeated blank filters to the actual samples by using the detection limit. So with the EPA's definition, we're going to take a 99% or 2.998 T value above the blank as our detection limit. So we take our eight chrome values shown here, compute the standard deviation of these eight replicates; look up the T value, 7 degrees of freedom, 0.99 or 99% confidence. This gives us our value as I showed before of 2.998. So, the detection limit is the standard deviation times 2.998 or 109.66 nanograms. Slide 35 Now to look at the unknown samples, we have to compare this method detection limit to the blank corrected chromium samples. So in the following table we're going to compare the method or blank corrected chromium values to the method detection limit. And this comparison shows a potential for a serious problem. The chromium method detection limit was 109. The average blank corrected value on the chromium samples was only 115. So the average value is only slightly higher than the blank. And in fact six of the 15 samples were below our method detection limit. 8 Slide 36 In this little spreadsheet you can see what I've added to the right-hand side of our chromium data is another column, which is the blank corrected value. So this is after we remove the chemical matrix; that is, the blank filter, this is the amount of chromium that we believe is contributed by that air sample, by the particles in the air sample. And you see that six of these values are below the detection limit or less than 109 nanograms. In fact, three of them are even negative. Three of them had less mass on the filter with the particles than the average of the blank. So our chromium data is right near the detection limit. Some of it is below the detection limit, and it's difficult to differentiate the chromium data from the blank. Slide 37 The titanium picture is completely different. Here for titanium, you see that the detection limit, which is a standard deviation of our replicates times the T value, is only 12.6 nanograms. Slide 38 And when we apply this to the titanium samples, we see that the blank corrected values are orders of magnitude higher than our detection limit of 12.6 nanograms. Slide 39 So, on our titanium table you see that all of the data clearly rises above the detection limit and we have no difficulty identifying in all samples a valid titanium concentration with better than a 99% confidence. Slide 40 Now I want to fast forward to the third lecture and show this data as distributions, because I think distribution plotting is a really nice way of demonstrating what I've been trying to show in my Excel tables. On this graph, I'm showing the distribution of the sample filters and the blanks, the sample filters being the line made up of the diamonds. The blanks, the lines slightly below it made up of the little squares. These are log probability plots made in Excel, and I will spend considerable time in the next lecture teaching you how to do this yourself. You don't have to rely on me, but these show that the distribution of the filters is very close to the distribution of the blanks. In fact, distribution of the blanks runs into the lower tail of the distribution of the filters. Now, the X-axis in this graph is probability. But I can't figure out how to make Excel actually put probabilities down. But what Excel does is it gives you the Z scores from the standard normal distribution. So just remember from your statistical background a Z of zero is 50%, Z of 1 is 84%, Z of 2 is 97 and a half percent, et cetera. So you can imagine, you can replace those numbers, those Z scores with percentages if you prefer. But again, a lot more of this next week. Slide 41 But what I want to show by example is the distribution of chrome compared to the distribution of titanium. You can see for titanium, these are both log probability plots, but the distribution of the blanks is so much lower than the distribution of the samples that that's clearly going to be no problem in interpreting any of those concentrations. I like log probability plots. They're very helpful in displaying and showing information, and I wanted to show you a couple of examples here. END OF PART 1 9 Slide 42 Now, I want to look at the other side of that image, that diagram we showed earlier in the lecture. I want to look at what happens in the environment, and sampling variability in the environment, and contrast that to what we just finished talking about, which was variability in the analytical method which helps us determine what the detection limit is. Now, what is the purpose of your study? The purpose of the study is to estimate some value that's out there, some true environmental value. It may be a mean. It may be a 98th percentile, whatever it is, we're trying to estimate it, because we don't ever know what the true value is because we rarely, if ever, have a complete sample. We have a small sample taken of the large population. Slide 43 And that small sample is trying to define the uncertainty in that large population of values. So we're trying to estimate the true population value from a sample estimate and the true population uncertainty from a sample estimate. I've shown you how to do that for the measurement uncertainty, by replicate samples and the T table and the standard deviation. Slide 44 Now we're going to look at the uncertainty in the environment. Why is there uncertainty in the environment? There's a lot of variability in the environment or uncertainty, for many reasons. And your study design is hopefully sufficient to characterize what this uncertainty is. Concentrations vary for a lot of reasons. And these next couple of slides show some of the reasons. We have variation due to the location, distance, direction, elevation relative to a particular source. So as we move around from location to location relative to a major air pollution source we'll have different concentrations. We may get a non-uniform distribution because of topography or hydrogeology or meteorology, or any other kind of biological, physical or chemical distribution mechanisms which is going to disperse our pollutants unevenly. There may be variability in species as we look at across different heavy metals or organics or in chemicals like nitrates or sulfates we may see differences. Slide 45 There may be variation in just the background over time. Some of what we measure is background in the environment, sort of long averages from distant locations. That may vary as well. Local emission sources may vary. If you're measuring on a Sunday compared to a Wednesday, traffic is going to be quite different. There may be a problem at a local source. There may be a process upset or an accident. And under those kinds of conditions we sometimes see very high levels in the environment. Even the averaging time of your sample, that is, if you take a one-hour sample, a one-day sample, a one-month sample, is going to determine the variability in the ultimate answer. And finally, even calibration will contribute. So all of these factors together, every time you draw a sample for one of a whole variety of reasons, you're going to get a different answer. Slide 46 So it's our job to characterize what that uncertainty is. And the example I want to use is a data set we collected here in Chicago on PM 10. This was a data set that's used for national ambient air quality standards violation decision, but it's a nice way for me to illustrate some of these concepts. 10 Slide 47 So here we have a PM 10 sample, a distribution of PM 10 values in Chicago, for a single monitor. And at this monitor it's kind of nice because I have a full three-year record of PM 10 data. So I have one measurement every day for a full three years for 1096 values with no missing values. So I'm able to a little bit artificially define the whole population. Every single one of those samples with a three-year period with 100 percent data capture. And I'm going to take this population of values and I'm going to sample it three different ways. Slide 48 I'll sample it once every 12th day, sample it once every 6th day, and sample it once every three days. Each one of these samples will be a way of estimating the true value, they will give us a point estimate of the true value that of all 1096 values. Now I've chosen 3, 6 and 12 because those are the sampling frequencies we use. We all sample our PM networks on a once every six or once every three-day basis. Nationally, some of our very expensive monitoring programs, operate on a once every 12th day basis, because we can't afford to collect all the samples you get in a more frequent sampling structure. I'm able to take this long series of values, sample it 12 different ways for a 12-day sampling frequency, six different ways for a six-day sampling frequency and three different ways for a three-day sampling frequency to estimate the underlying true value. From this I can demonstrate the kinds of uncertainty you get as a result of sampling design. This table shows the results, and it's a little bit busy. But across the top of the table are the statistical parameters I need to calculate. The mean, the minimum, the maximum, the standard deviation, the standard error of the mean and the confidence interval. Slide 49 I want to first define what those are and then I'll come back to this table and we'll talk about the numbers in the table. So what is the population mean? The population mean is the average sum over all N values of the measurement divided by capital N total number of values in the population. This is rarely known. In my somewhat restrictive artificial situation, I'm defining the population as a thousand ninety-six samples. But in the real world you really don't know what this is, or you rarely know what this is. Slide 50 And the population variance is defined as the -- it's sort of the sum of the squared deviations of all the individual measurements on the population mean. Again, we usually don't know what this is. We estimate the mean and the variance from the sample mean and the sample standard deviation. Slide 51 So this slide shows the sample mean, the sum of the values over N, and N is the number of samples in our little group that we're looking at. Slide 52 And the sample standard deviation that I defined before is the square root of the sum of the square deviation from the mean. Sample variance is defined here. Sample standard deviation is the square root of this value. 11 Slide 53 How about the sample mean? The sample standard deviation describes the spread of the values within our sample. But the mean itself has a standard deviation. Slide 54 And we call that the standard error of the mean. And that's simply the variance divided by N. So the variance of the mean that we're estimating is the variance of the sample divided by N, or the standard deviation of the mean is the square root of this value, the square root of the sample variance divided by N. Now, also note in this figure is the value F. F is the finite population correction factor, because we're sampling without replacement. If I were drawing these samples from my thousand ninety-six possible values and every time I took a sample out I was able to put that one back in the population, F would be zero. And in the real word, F is usually approximately zero because N, capital N, is much much greater than little n size of your sample. But in this reduced example I can't assume that little n over capital N is close to zero, so I have to compute it. Not a big deal. Normally it's not a problem. And so as I mentioned the standard error of the mean is the square root of the variance divided by N shown here. Slide 55 And, finally, the confidence interval around the mean is the range where I, making a probabilistic estimate that my estimated mean is somewhere within a lower limit and an upper limit with a certain level of confidence. This equation shows it for the standard normal table, and it assumes that you know what the population variance is. Slide 56 Since we typically don't know the population variance, we end up using the T distribution. And the T distribution shown in this slide is based on, again, N minus one degrees, alpha divided by two confidence, but it allows us to use our sample estimate of the standard deviation S in computing the confidence interval. So using this equation I will not only take my estimate of the mean, X bar, but I'll compute the upper and lower confidence, which is my way of saying I have a certain degree of confidence, I'll use 95% that the true mean is somewhere in this range. Slide 57 So that's how I computed the values in this table. So you see the number of samples in each of my subgroups. I have once every three day sampling starting on the first, second or third day of the series, once every six day sampling starting on either the first through the sixth day, and my once every 12 day sampling starting on either the first through the 12th day and the number of samples, complete 1,096, no missing data. Now the next two columns show the minimum and maximum that you get from that sample. The true minimum is 6.2. True maximum is 115.5. But most of these samples don't have the true minimum and true maximum. They miss it. The actual grand average or the population average is 32.4 micrograms per cubic meter and the population standard deviation is 17.0. Standard error of the mean is zero because I know the population values, so the confidence interval is zero. Then each line below that shows the estimated mean, the estimated standard error, and the confidence intervals of the mean for the different samples. As you look down the table, you see that as sampling frequency decreases, if I sample less frequently, my estimated standard error, or my ability to estimate the mean, goes up or is less precise. So the confidence interval then increases accordingly. So this table actually shows, for example, for my last sample, the last line on this table, my once every 12-day sampling starting 12 on the 12th day of the series, those 91 samples had a mean of 30.75. So I was about two micrograms below by chance. The standard error of the mean was 1.6, and my 95% confidence interval was between 27.6 and 33.9. I know where the mean was. I'm pretty sure the mean was in that range, but I don't know exactly where it was from this once every 12-day sample. Slide 58 To make it a little bit easier to see the result, I've graphed them. I first graph the minimum, maximum, and the average PM 10s from these different samples. What you see here is that the central tendency measure, the mean moves a little bit as my sampling rates increases. But in general, if you want to estimate something near the tail of the distribution, like the maximum or something close to the maximum, it's a lot more uncertainty in that estimate as you increase the sampling frequency or decrease the sampling frequency. So my once every 12th day sampling frequencies have a wide range for estimates of the maximum, whereas estimates of the mean are not too bad. If your ultimate goal is estimating the mean, you might be able to do a pretty decent job with a once every 12-day sample. If you want to estimate the tail of this distribution, you'll probably miss it with the once every 12-day sample. Slide 59 The next slide shows the confidence interval. It shows what the 95% confidence interval is around the mean is. It shows as sampling frequency decreases, I go to once every three, six, 12-day samples, these confidence intervals get larger. And so if your stated goal is to get the mean value within a very narrow range of uncertainty, then you clearly need to have a very frequent or almost everyday sample to do that. If you're willing to accept up to five, ten, 15, 20% error in the mean, then a less frequent sample is sufficient. We'll get back to this at the end of the lecture when I talk about estimating the number of samples required for a particular study design, because it reproduces the same data quite nicely. Slide 60 Finally, I'm showing you the results as box plots. So this first box plot is the full data set, the full population of values. Slide 61 And then the first of the box plots would show the once in a three-day sampling; and it shows you the mean value in the center of the notch is pretty reproducible. And the size of the notch is 95% of the confidence interval of the mean, and the box is the inner quartile range. I will define this in much more detail in subsequent lectures. But it shows that the distributions look about the same. And they look very similar to the everyday distribution. Slide 62 When I go to the every sixth day samples, the distributions are beginning to show a little more variability. Interquartile ranges or the boxes themselves are moving around; the notches are growing, which means my uncertainty in the mean estimates are growing. The extreme values are being bounced around a little bit more. Slide 63 And distributions demonstrated from the once every 12-day sampling show a great deal of variability. So especially in extreme values some of these samples are very different than others. And if your end point is an extreme value, this would be a very inefficient way to get a good estimate looking at once every 12-day sampling. 13 Slide 64 So I just wanted to show that sampling frequency leads to error in our ability to estimate a parameter. In this case, a sampling mean. The more frequent we sample, the less error. The less frequent we sample, the more error. You want to select your sampling frequency specifically to meet the desired precision of your sampling protocol. Slide 65 I need to speak a little bit about sampling designs, then we can wrap this lecture up by showing you how to actually specify this uncertainty in your study design. So what are the basics of sampling designs that are available to us? We can look at a number of study designs. And they're characterized either as haphazard sampling, judgment sampling or probability sampling. Haphazard sampling, or completely random sampling, requires a homogeneous population over space and time, if you want to get an unbiased assessment. If we're looking at how people in the United States are being exposed to air pollutants, this is a very inefficient way to sample, because we don't live homogenously distributed across the United States. We live in cities. If we randomly sampled locations in the country we would miss the population exposure. We'd have to take many, many samples. Judgment sampling is what we tend to use, or we certainly include judgment sampling because we know where people live so we specifically go there to do our sampling. Once we get to where we're going to sample, we'll use some kind of probability sampling method, some kind of systematic or random sampling method which allows us to have an efficient sampling design. Slide 66 So what are these designs? Simple random sampling, the most basic, doesn't work well if the population contains patterns. Stratified random sampling, which is really useful where there's a pattern so we can divide into different strata. Within a strata, we think it's much more homogenous. We'll divide our city into urban, suburban, rural background. Within each of these strata, we'll be able to make the assumption that the concentrations are much more homogenous. So we'll separate off the industrial areas from the city from the suburban areas of the city. Multi-stage sampling is when we're not exactly sure, so we go in there, take some samples, look at our results and go back and adjust for subsequent sampling. And it's potentially useful if you really don't have a particularly good view of what you're going to find. Slide 67 So we're going to ultimately, and typically use some kind of systematic sampling, it's the method of choice when there's a pattern or trend. As I mentioned before, we typically sample particles once every sixth day across the United States. And everybody does it on the same day. So it's quite uniform. If there's a strong linear pattern we may want a double sample or we may need to do search sampling for other particular kinds of issues, but we're normally looking at sort of a systematic sample over time or space. Slide 68 This can be demonstrated with these simple charts. Random sampling along on the line. Not very efficient if we have non-homogeneous distributions. Stratified random sampling, much more efficient if we can define strata. It's not too hard with air pollution. We know an urban area is different from rural area. So we can define strata pretty straightforward. Within each strata we may do cluster sampling; that is, I may do a lot of sampling in my heavily contaminated urban areas and do much less sampling in less contaminated rural areas, just because the relative areas are so different. I have to clearly over sample where people live. 14 Slide 69 But ultimately what we do within a strata, within a particular sampling time, is some kind of a systematic sample as demonstrated in the middle slide where we look at samples collected that are evenly spaced at a particular location overtime. We could also do this at a particular location over space and so we, if we have to subdivide, deposit of something in soil, we may draw a line, step along that line with equally spaced sampling intervals. The same idea: Space or time. Slide 70 And in two dimensions these graphs translate up with simple random sampling in a twodimensional space versus stratified random sampling, which is much more like what we would end up doing. We would pick our sampling sites within each strata somewhat randomly. Within each sampling site, we would sample systematically equally spaced samples over time. Slide 71 If we're looking at an area that we have to sample like a harbor or something like that, we may lay a grid down over that and then sample systematically over that grid. Cluster sampling shows here where each of those locations may be. One may be an industrial area. One may be a suburban area. One may be a rural area for example. Slide 72 This is a nice little example that my students and colleagues are working on that we're just sampling sand at a beach for a particular toxic material and we laid down these lines and then we sampled along these lines at equally spaced intervals. This would be a systematic random sampling of way to get at the concentration of this toxic material in the beach sand. Slide 73 This is one of my colleagues sampling. It's a tough job but somebody has got to get out there and do it. Slide 74 Now let's get to how we translate this into number of measurements required. So we're going to define this simple equation here. It's going to help me explain this concept. If you have -- if you've picked a method, you know what its noise is, you apply this method to an ambient environment and you want to estimate something like the mean. How close do you want to get to the true value? You can use the analytical method in the uncertainty in the environment to calculate the number of samples you require for a specific precision. If the absolute margin of error that can be tolerated in the measurement of the mean of X is D, so the absolute error D, and we're going to accept the probability alpha of exceeding that error and we write this statement: The probability of the mean that we estimate from the true value greater than a certain deviation D is less than a probability. So we specify D, the distance we want to be away from the mean. We specify alpha, our confidence; and if we know something about the distribution of samples in the environment, we can compute the number of samples required. Slide 75 Now, if we know the population variance, and we have a very, very large sample, then we use the standard normal curve. The Z table. Typically, we have a situation where the size of our sample N is very, very much smaller than the total population N available. 15 Slide 76 So this equation reduces to what is shown on this slide. But remember, we usually don't know what sigma is; we're estimating sigma using a standard deviation. Slide 77 So we end up with a T distribution shown here, where we use S as our estimate of sigma and the T distribution and alpha divided by two, probability N minus one degrees of freedom. Slide 78 And again, if our sample n is kind of small compared to capital N, the total number of possible samples out there in the environment we could reduce the equation to what's shown on this slide. Now, this is a bit of a nuisance to calculate, because T is a function of N. And so what you have to do is you have to guess a value for N, compute the T, compute the value of N. See how close you are to your guess, make a judgment and go around this loop a couple of times until you get the final answer. So it is a trial and error solution, and Excel doesn't like doing this for you, so you have to do it yourself. Slide 79 Now one useful concept that helps in understanding this calculation is the coefficient of variation. If we define the coefficient of variation as the population of uncertainty divided by the population mean, sigma divided by mu, which we estimate as a standard deviation divided by the sample mean, then we're able to specify a relative error, D sub R, in terms of this coefficient of variation. Our relative error now is the difference between our sample estimate and the true value divided by the true value. Slide 80 Using this concept of coefficient variation and relative error, or mean normalized error, we're able to specify the number of samples required. It's a function of the coefficient of variation divided by this mean relative error. Slide 81 Here is the case where n is large compared to capital N. But typically we're here where the number of samples is quite small compared to the population of possible samples out there. And so from this equation we're able to compute the number of samples we need, given a variation in the environment, coefficient of variation, and a mean relative error we're willing to accept, D sub R, and a value off the standard norm table. We're again specifying the standard deviation divided by the mean which we estimate from our sample. Slide 82 This table displays the results of this calculation. And there's three things happening in this table. I want to draw your attention to these three things to help you understand what's in the table. Along the top, in the right-hand five columns is the coefficient of variation. This is a variation in the environment. This is the standard deviation divided by the mean in the environment. The first column, the first of these five is 0.1, is a very tightly varying, very little variation in environment. Standard deviation is very low compared to the mean. The second column has a coefficient ratio of 0.5 then 1, 1.5 and 2. Just to draw your attention back to our PM 10 example, the coefficient of variation of the one thousand ninety-six PM 10 samples was about 0.5. So in the environment 0.5 for PM 10 in an area is probably a good definition for variation. If something 16 is more variable, you could go to the right a little bit. But just to sort of anchor this discussion, the PM 10 values were around 0.5. Now, the second column in this table is the relative error. It's the deviation between your estimate and the true value that you're willing to accept. And so this is solved for four different relative errors. A 10% deviation is a very stringent relative error. You want to be within 10% of the correct value. 25%, 50%, 100%, or 200%. It doesn't really matter that your estimate be that close. And then, the third parameter in this table is the confidence that you want to make this judgment. And I've solved this for two different confidences, confidence of 80% and confidence of 95%. If you want to be 80% certain that you're within a certain amount relative to the mean, or do you want 95% accurate? So what does this imply? It's kind of nice. It implies, if you want to be 95% certain that you're within 10% of the true value and the natural variation of the environment is 0.5, then you need 97 samples, which is exactly what the previous PM 10 sampling table shows us. It showed us that if we took 91 samples out of our 1,096, and we drew a 95% confidence interval, we got to within about 10% of the mean every time. So whether you look at PM 10 sampling, my example, I'll just repeat, sampling the same distribution or computing it on based on the standard normal curve, it's reassuring to know you get the same answer as you should. So this table is just there to give you a glimpse as to the kind of sampling designs you need in terms of number of samples to achieve specified accuracy and your ultimate estimate of a sample mean, given a certain probability that you're going to get there. It's bit of a complicated table. Look at it a little bit and send us your questions if you still can't figure it out. Slide 83 So to summarize this last concept: Uncertainty and certainty in an analytical method can be quantified through specific statistical procedures, and we've shown you a number of examples. We can use these procedures to actually compute the number of samples that we should collect with the specified level of uncertainty and specified probability that we're going to get there. We can specify uncertainty in environmental sampling. We can characterize uncertainty in a detection limit. And we can use these to understand the quality and quantity of the data that we need to collect. Slide 84 So from this lecture, we've talked about statistical definitions of the detection limit. You should have a pretty good idea of what a detection limit is and how it's defined and how we actually apply this to real environmental samples. And the context of the detection limit, you also should have a pretty good understanding of uncertainty in the measurement compared to uncertainty in the environment itself and how we handle uncertainty in the environment, by sampling, how we characterize uncertainty in the environment by estimating confidence intervals and standard errors. And finally, to wrap things up, we took these concepts, to give you an idea how to estimate the number of samples that are required to actually meet the measurement objective in your sampling program. This concludes the second lecture. In the third lecture we're going to come back and explore in much greater detail issues of quality assurance and look at distributions and ways of looking at your data to help you understand the quality of what you've done. At the website where this lecture is posted will be all of our spreadsheets that we've used in these examples and we encourage you to look at those, because you can see into the spreadsheet the formulas and how they're defined. And hopefully you'll generate feedback for us and questions. We're here, ready to go to answer your questions, look at your data and help you with your specific environmental problems. Thank you. And I'll be back with lecture three whenever you're ready. END OF MODULE 2 17