Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chemistry 153 Clark College Chemistry 153 Statistics Review Pertinent Definitions ERROR in a measurement is the difference between the observed (measured) value and the "true" value of the quantity measured. ACCURACY expresses the correctness (absence of error) in a measurement, i.e. the closeness of an experimental determination to the "true" value. PRECISION describes the reproducibility of a measurement, i.e. the agreement between the numerical values of two or more experimental determinations done in identical fashion. Strictly speaking, precision and accuracy do not have to be related. One could have data where the experimental values varied quite a bit and therefore would be termed imprecise. However, it is possible that the average of these numbers could be very close to the "true" value and the accuracy of the mean would be high. (Dartboard analogy) DETERMINATE ERRORS are those that do have a definite value which in principle can be defined. Examples include use of uncalibrated equipment, incomplete chemical reaction, improper use of a balance or colorblindness. INDETERMINATE ERRORS are those that do not have a definite measurable value, but rather fluctuate in a random manner. All physical measurements are subject to a degree of uncertainty, or indeterminate error. The range or spread is the difference between the highest and lowest values in a series of measurements. ABSOLUTE ERROR in a measurement is the difference between the observed value and the accepted value. Absolute Error = (experimental – accepted/"true") RELATIVE ERROR defines error in comparative terms (the percent difference). Usually in percent (%) or in parts per thousand(ppth). Percent Relative Error = (experimental - accepted) x 100% (or 1000) accepted Example: In a particular analysis the accepted value for the percentage of chlorine in a sample was 24.34%. An experimental result of 24.29% would have an absolute error of 24.29 - 24.34 or 0.05%. The relative error would be: (24.29 ! 24.34) x 100% = -0.2% 24.34 or (24.29 - 24.34) x 1000 = -2 ppth 24.34 MEAN VALUE, (x-“bar”) is the arithmetic average of a set of measurements. It is represented by the equation below, where Xi represents a single experiment value and N is the number of determinations. x = N ! xi i=1 N AVERAGE DEVIATION is a measure of precision. It is represented by the equation below, where Xi represents a single experiment value, x is the mean and N is the number of determinations. d = N " xi ! x i =1 N Statistics Review Rev Spring 2009 NF Page 1 of 9 Chemistry 153 Clark College STANDARD DEVIATION is probably a more reliable measure of precision. It is a statistical function representing in absolute terms the interval about the mean within which a majority of experimental value should lie. The second expression below is often an easier technique to use with a calculator. You may also be able to use your calculator functions to perform this calculation!! ' !"#(x -x ) i S= 2 $ %& N-1 ( x ' ** ) 2 i or (' x ) i 2 + N , N-1 RELATIVE STANDARD DEVIATION, SR or RSD, is the standard deviation divided by the mean value. This may be expressed in percent or in ppth. S R = S x ! 100 (in percent) S R = S ! 1000 x (in ppth) Understanding Standard Deviation (A Measure of the "Spread" in Data) Data often tends to "pile up" near its average, or that it can be fit by a normal, or gaussian curve: 2 2 The Gaussian curve function is approximated by f (x) = Ae -s ( x-m) where x is a particular measurement, m is the arithmetic mean of the sample, s is the standard deviation of the sample and A is the height of the curve at m. A m X X = average S = standard deviation 68% of data in range X ± 1S 95% of data in range X ± 2S 99.7% of data in range X ± 3S S states that 68.26% of all data lies between ±1s of the mean, 95.44% of all data lies The empirical rule between ±2s, and 99.74% lies between ±3s. If the data is very precise, then the standard deviation will be small, meaning that the “spread” of the data is narrow and our values are very close to the mean. However, if the standard deviation is large, then our data is not very precise, and we must repeat the experiment. Note that this does not determine the accuracy of the data, or how close the data is to an “accepted value”. We may assume (which may not be a good assumption) that our experiment will result in a solution that is accurate, and we must then obtain data that is precise. For any given set of data, we must determine the mean, the standard deviation and the relative standard deviation. The following table shows an example set of data and the determination of its mean and standard deviation Statistics Review Rev Spring 2009 NF Page 2 of 9 Chemistry 153 Clark College x= !x N i = xI xi - x 2 3 3 6 1 -1 0 0 3 -2 15 =3 5 ! (x N = 5; s = (x i -x ) 2 1 0 0 9 4 i -x ) 2 = 14 14 = 1.87 (5-1) Evaluation and Rejection of Data All experimental procedures contain sources of error. Some of these, called "determinate errors", can be treated in a straight-forward manner. For example, the sensitivity of balances or calibration of burets contributes to the number of places to which the data are significant, and hence to the number of significant places in the reported results. Other errors, called "indeterminate errors", are results which deviate from the set in a random manner and for no apparent cause. These errors are recognized by using statistical methods. Except where specifically instructed otherwise, all analytical procedures in this course are conducted with quadruplicate samples. One of these four samples could be rejected for cause. Naturally, if a procedural error is made (e.g.: adding the wrong reagent, titrating past the end point, or dropping the flask) the sample is rejected immediately and a fifth sample should be weighed out to replace the lost sample. Occasionally one result (an outlier) seems quite different from the remaining three results for no known cause. In this case one should suspect a random, or indeterminate, error and apply statistical methods in order to confirm the natural desire to reject the outlying value. A simple and statistically valid test is the Q-test in which the difference between the outlying result and its nearest neighbor is compared with the total sample range. If the ratio of the difference to the sample range exceeds a statistically derived value (see Table below) the outlying result may be rejected. For this course, we will reject data with a 90% certainty that the rejection is valid. It is essential that all Q-test calculations be shown on the back of your report. However, just because the Q-test ratio is exceeded is no sure guarantee that the outlying value should be rejected. If the outlying value falls within the expected limits of error due to method and instrument limitations then the suspect result is actually a valid result. Statistics Review Rev Spring 2009 NF Page 3 of 9 Chemistry 153 Clark College Q-TEST REJECTION QUOTIENTS Number of Observations 90% Confidence 95% Confidence 3 0.941 0.970 4 0.765 0.829 5 0.642 0.710 6 0.560 0.625 7 0.507 0.568 8 0.468 0.526 9 0.437 0.493 10 0.412 0.466 To perform the Q-Test, list all of your data points in order of increasing value. Check your values and if only one of the values seems different than the others, perform the Q-Test on that value in the following manner: (the examples have a confidence interval of 90%) Example 1: Values: 86.20, 86.21, 86.27, 86.44 Q= difference 86.44 - 86.27 0.17 = = = 0.71 < 0.765 range 86.44 - 86.20 0.24 Comment: Ratio is less than tabulated value of Q, and outlying result must be retained. Example 2: Values: 87.80, 87.00, 86.98, 86.84 difference 0.80 = = 0.83 > 0.765 range 0.96 Comment: Ratio is greater than tabulated value of Q, and outlying result may be rejected with 90% confidence. Example 3: Values: 86.45, 86.25, 86.24, 86.21 difference 0.20 = = 0.83 range 0.24 Comment: Ratio is the same as in Example 2. However, the difference between the outlying result and the sample mean 86.45 - 86.29 = 0.16 is only 1.9 part per 1000 of the sample mean ( 0.16 *1000 ) and is therefore within acceptable limits of error due to 86.29 indeterminate causes. The outlying result should not be rejected since it falls within the range that is expected for the given analytical procedure. Example 4: Values: 87.80, 87.64, 86.96, 86.64 difference 0.16 = = 0.14 range 1.16 Comment: This wide sample range and small difference is a frustrating situation. All four results must be retained according to the Q-test, but the range is 13 parts per thousand. If this is beyond acceptable limits, one must admit poor technique and repeat the entire analysis. IMPORTANT: The Q-Test is only one test for the evaluation and rejection of data. In this class, a value must fail both the Q-Test and the 5 ppth test to be rejected. The 5 ppth test is discussed in the next session. Statistics Review Rev Spring 2009 NF Page 4 of 9 Chemistry 153 Clark College Precision and Parts-per-thousand (ppth) Analysis Similar to a percentage (parts-per-hundred), ppth is simply a comparison with either more precision or more accuracy than a percent error. This same process can be used for a variety of errors – in Environmental Chemistry, detection levels and precision are often measured in the parts-per-million or parts-per-billion range. There are three ways we will use ppth analysis: 1. Precision – using ppth to define a number of sig figs to report in a measurement. For data collection, always report the number of digits given by the measuring device. Examples: lab counter balances, ±0.01 g; analytical balances, ±0.0001g; burets, ±0.01. However, you will report a piece of data using the # of significant figures that gives you a precision of 1 ppth, or very close to it. For example, you make a measurement on the analytical balance that is 11.01576 g. To determine the number of digits to report (and therefore to which decimal place to round the mass value), you want to find an ‘x’ value that, when divided by the measurement rounded to the same decimal place, roughly equals 1/1000. The example below shows three possible “rounding places”, by inspection the middle representation below is nearest to 1/1000, therefore you report your values to the hundredth place. x= ± 0.1 11.0 ± 0.01 11.02 ± 0.001 11.016 Reported value = 11.02 ± 0.01 g If the situation is more complicated, for example a calculation near 40%, you will need to go through the same process and you will need to try and determine if it is OK to use 1/400, ±0.4 or 1/4000, ±0.04. There is sometimes no easy answer and it depends upon the error in your actual measurements and the least precise decimal place for that piece of equipment. As we go through the quarter, we will discuss various situations in the introduction to each experiment. 2. Average error in a set of data, and the relative standard deviation (RSD). You will often be taking 4 readings or measurements on a sample or unknown. After you do all calculations and find the standard deviation (described previously), you will use this to find the Relative Standard Deviation in ppth. To find the RSD: RSD (in ppth) = s x 1000 x (the mean) RSD(in %, or pph) = s x 100 x 3. The 5-ppth Rule, also known as Relative Error You will use two different criteria to determine if you may toss out an errant piece of data, the Q– Test (described previously), and the 5-ppth rule. This rule is similar to a percent difference. For a piece of data to fail the 5 ppth rule, the ppth difference must be greater than 5 ppth: value - mean x 1000 ! 5 ppth ! Fails! mean Remember! This test is used in conjunction with the Q-test. You may not discard a value unless it fails both the Q-test and the 5-ppth rule. Statistics Review Rev Spring 2009 NF Page 5 of 9 Chemistry 153 Clark College For this class, you must always calculate the mean, standard deviation and relative standard deviation for all data that you collect. The examples below goes through the complete process. Pay attention to the significant digits and decimal places in the values. Example: 4 different samples of iron ore are weighed on an analytical balance, and the following masses were obtained: 18.6389 g, 18.6357 g, 18.6273 g, 18.6310 g. Tabulate this data, rounding to ppth precision. Determine the mean, the standard deviation, and the RSD for this set of data. Trial 1 2 3 4 Mass, in g 18.6989 18.6357 18.6473 18.6110 mean = x = !x i ( 18.70 + 18.64 + 18.65 + 18.61) = 18.65 g = N Mass, in g, to ppth precision 18.70 18.64 18.65 18.61 4 (xi - x) ( xi - x) Trial xi 1 18.70 0.05 0.0025 2 18.64 -0.01 0.0001 3 18.65 0 0 4 18.61 -0.04 !(x Std. Deviation = ! (x i -x N-1 ) 2 = 0.0016 - x) = 2 i 2 0.0042 0.0042 = 0.0374 = 0.04 g 4-1 Note: The standard deviation should be rounded to the same decimal place as the mean, and has the same units. RSD = s 0.04 x 1000 = x 1000 = 2.145 = 2 x 18.65 Note: Because the standard deviation has only one significant digit, the RSD is rounded to only one significant digit. It has no units! Summary: Mean = 18.65 g Rounded to reflect ppth precistion. Standard Deviation = 0.04 g Rounded to the same decimal place as the mean. RSD = 2 ppth Rounded to the same number of significant digits as the standard deviation. Statistics Review Rev Spring 2009 NF Page 6 of 9 Chemistry 153 Clark College Example: Five determinations of percent iron in iron ore yielded the results 67.45, 67.37, 67.47, 67.43 and 67.40%. Calculate the average deviation, the standard deviation, and the relative standard deviation. Trial Percentage, % x!x ( x ! x )2 1 67.48 0.05 0.0025 2 67.37 0.06 0.0036 3 67.47 0.04 0.0016 4 67.43 0.00 0.0000 5 67.40 0.03 0.0009 Σ = 0.18 Σ = 0.0086 mean = 67.43% Average Deviation = d = 0.18 = 0.04% (Note: the same units, decimal place as in the data) 5 Standard Deviation = s = 0.0086 = 0.05% (Note: the same units, decimal place as in the data) 4 Relative Standard Deviation in ppth = 0.05 x 1000 = 0.7 ppth, less than 1 ppth. 67.43 Practice Problem 1: Use your calculator or other method to find the mean, the standard deviation, and the relative standard deviation (RSD) for the following set of mass measurements: 2.13 kg, 2.15 kg, 2.15 kg, 2.17 kg, 2.09 kg, 2.12 kg, 2.17 kg, 2.09 kg, 2.11 kg, 2.12 kg. (Assume all values are to be retained.) Practice Problem 2: For the numbers 116.0, 97.9, 114.2, 106.8 and 108.3, find the mean, the standard deviation, the range, and the RSD. Using the Q-test, determine if the value 97.9 should be discarded with a 95% confidence interval. Answers: Practice Problem 1: n = 10, Practice Problem 2: value must be kept. n = 5, Statistics Review x = 2.13 kg, sn-1 = 0.03, RSD= 13.8 10 ppth to 1 s.f. x = 108.6, sn-1 = 7.1, Range = 18.1, RSD = 65 ppth, Qobs. = 0.49 < Qtable so Rev Spring 2009 NF Page 7 of 9 Chemistry 153 Clark College A COMPLETE example: You have just performed an analysis of chromium in steel samples by spectrophotometry, and you have calculated the following 5 values: Trial: % Cr 1 16.237 2 16.251 3 16.233 4 16.361 5 16.239 You will be reporting percent chromium on your data report sheet, so you will need to round these values to ppth precision. To determine the correct decimal place to round, we consider the following ratios x= ± 0.1 16.2 ± 0.01 16.24 ± 0.001 16.237 (this can also be done purely by inspection – since the first number of the measurement is a one, we need four total digits to be close to “1000”) Trial: % Cr 1 16.24 2 16.25 3 16.23 4 16.36 5 16.24 Looking over the rounded data, the fourth trial seems “off”. We can use the Q-test and 5-ppth test to determine if we can reject the data. Note that the data must fail both tests to be rejected. Q-test (90% confidence) Q= difference 16.36 - 16.25 0.11 = = = 0.846 > 0.642 range 16.36 - 16.23 0.13 The value of Q90 for 5 data points is 0.642 (from the table in this packet). Because the determined Q value is greater than Q90, this data point fails the Q test. However, we cannot reject it outright, we must also perform the 5-ppth test to see if it fails that. 5-ppth test value - mean x 1000 ! 5 ppth ! Fails! mean For the 5-ppth test, we need to determine the mean of the data, using all 5 data points. The mean is 16.26% Cr. 16.36 - 16.26 x 1000 = 6.2 ! 5 ppth ! Fails! 16.26 The fourth trial will be rejected, as it fails both the Q-test and the 5-ppth test. The new data now looks like: Trial: % Cr 1 16.24 2 16.25 3 16.23 4 16.36 5 16.24 With the remaining four pieces of data, a new mean, standard deviation and RSD will be calculated. Statistics Review Rev Spring 2009 NF Page 8 of 9 Chemistry 153 Clark College x= The mean: ( 16.24 + 16.25 + 16.23 + 16.24) = 16.24% 4 The standard deviation (s): (x -x ) (x ) 2 Trial xi 1 16.24 0 2 16.25 0.01 0.0001 3 16.23 -0.01 0.0001 4 16.24 0 0 i ! (x ! (x Std. Deviation = i -x N-1 ) 2 = i ) 2 -x = i -x 0.0002 0.0002 = 0.008165 = 0.01 % 4-1 The standard deviation is the amount of uncertainty in the last digit of the data points, or the mean. Since those data are reported to the hundredths place, the standard deviation must be rounded to the same decimal place to reflect the uncertainty in that decimal place. Also, the standard deviation carries the same units as the data. So, the standard deviation is 0.01%. The relative standard deviation (RSD): RSD = s 0.01 x 1000 = x 1000 = 0.6158 = 0.6 ppth x 16.24 The RSD is a measure of how large the standard deviation is as compared to the actual data. Most calculation-based experiments in CHEM 153 will require an RSD no greater than a certain threshold (5 or 10 ppth). Since the RSD is based on the standard deviation, it should have the same number of significant digits as the standard deviation. In this example, the standard deviation has one sig fig, so the RSD is rounded to one sig fig as well. The final step is to report your data using the data report sheet provided, and to input your %Cr values into the class spreadsheet. For this “experiment”, the data report sheet would appear as follows: Trial: 1 2 3 4 5 % Cr 16.24 16.25 16.23 16.36 16.24 Mean 16.24% ← The mean is reported with units! Standard Deviation 0.01% ← S has the same units, decimal place as the mean. RSD 0.6 ppth ← The RSD has the same number of sig figs as S. Statistics Review Rev Spring 2009 NF All trials are reported, rounded to ppth precision. Trials that are rejected are circled, and calcs for Q and 5-ppth tests are shown on the back of the data report sheet. Page 9 of 9