Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CHE322_F06 CHAPTER4_EAD CHAPTER 4 EVALUATING ANALYTICAL DATA A. Characterizing a Measurement and Results B. Characterizing Experimental Errors C. Propagation of Uncertainty D. Distribution of Measurements and Results E. Statistical Analysis of Data F. Detection Limits 1 CHE322_F06 CHAPTER4_EAD A. MEASURES OF CENTRAL TENDENCY AND SPREAD A.1 ESTIMATORS OF CENTRAL TENDENCY The mean Estimator of the central tendency or the true value X in1 X i n The median X med is the middle value. For an odd data set ____________________________ For an even data set ____________________________ A.2 ESTIMATORS OF VARIABILITY (scatter) The range The range (w) is the difference between the largest and the smallest values in a data set The standard deviation in1 X i X 2 s n 1 Absolute standard deviation n-1= degrees of freedom = number of independent pieces of information on which a parameter estimate is based. sr s X %s r Relative standard deviation s 100 X Percent relative standard deviation The Variance in1 X i X 2 s2 n 1 2 CHE322_F06 CHAPTER4_EAD B. CHARACTERIZING EXPERIMENTAL ERRORS B.1 Accuracy E X Er X Absolute error 100 Percent relative error Accuracy is determined by Determinate Errors (systematic errors) 1. Sampling errors Non representative sample 2. Method errors S meas kC A S reag Incorrect k used Incorrect Reagent blank measurements Other Interferents 3. Measurement errors Instruments (loss of calibration) Equipment (page 59) 4. Personal errors Example: Dust in molecular absorption spectrophotometry Constant determinate errors Can be detected by using different size samples for making a determination Proportional determinate errors 3 CHE322_F06 B.2 CHAPTER4_EAD PRECISION Measure of the spread of data about a central value. Repeatability Spread of data obtained by one analyst using the same solutions and equipment during one period laboratory work. Reproducibility Reproducibility involves variations in analysts, laboratories, equipment, instruments, work periods etc… Indeterminate errors Inderterminate errors are random errors, which affect the precision. These errors can not be eliminated. Sources Sampling process Sample treatment Measurement (reading errors, electronic noise, stray light etc…) Evaluating/ Identifying Sources of Indeterminate errors Examples Make several determinations of a single sample/ item. Obtain measurements of several samples of the same 'composition'. 4 CHE322_F06 C. CHAPTER4_EAD PROPAGATION OF UNCERTAINTY Error = difference between a single measurement or result and the true value The uncertainty is the range of possible values that a measurement or result may have. It includes all errors, determinate and indeterminate. C.1 Uncertainty on the Result of Additions and/or Subtractions R A B C 2 2 s R s 2A s B sC C.2 R Uncertainty on the Result of Multiplication and Divisions A B C 2 2 sR s s s A B C R A B C C.3 2 Uncertainty of Mixed Operations Examples 4.7 Quiz 4.8 5 CHE322_F06 CHAPTER4_EAD C.4 Uncertainty for other Mathematical Functions FUNCTION R ln( A) R log( A) R eA R 10 A R Ak UNCERTAINTY s sR A A s s R 0.4343 A A sR sA R sR 2.303 s A R sR s k A R A Calculations of the Propagation of Uncertainty are used for the following purposes: 1) to compare the Expected Uncertainty of an Analysis and Actual uncertainty obtained 2) to determine Major and Minor Contributions to overall uncertainty 3) to compare of two or more methods 4) Development of best procedure for preparing a sample 6 CHE322_F06 D. CHAPTER4_EAD DISTRIBUTION OF MEASUREMENTS AND RESULTS Replicate 1 measurement 1 measurement 2 measurement 3 mean 1 Replicate 2 measurement 1 measurement 2 measurement 3 mean 2 Replicate 3 measurement 1 measurement 2 measurement 3 mean 3 Replicate 4 measurement 1 measurement 2 measurement 3 mean 4 Mean Presentation of results Two students determine the concentration of a solution of NaOH by titrating several aliquots of a single stock solution. Student 1/ Sample 1 Student 2/ Sample 2 Aliquots NaOH (M) Aliquots NaOH (M) 1 0.1007 1 0.1005 2 0.1010 2 0.1010 3 0.1011 3 0.1002 4 0.1013 4 0.1004 5 0.1005 5 0.1009 6 0.1009 6 0.1003 7 0.1008 7 0.1010 X s What can you say about the 'True concentration'? You need to predict 7 CHE322_F06 CHAPTER4_EAD 1) true spread of the population 2) true central value Population Population refers to all members of a system being investigated. It is an infinite number of data or a universe of data. Sample A sample is a finite number of experimental observations/ measurements. It is a tiny fraction of the population. A sample is that part of the population that is collected and analyzed. It is a subset of the population. Analysis of the entire population provides the population's true central value () and spread () in1 X i X 2 X N in1 X i N P(V ) V M N M the probability of occurrence of V N value of interest frequency of occurrence size of population In experimental sciences, we seldom sample the whole population. Rather a sample of the population is analyzed. From properties of the sample to properties of the population 8 CHE322_F06 CHAPTER4_EAD (How do we extend what we know about the sample to the population?) Requirement -Need to make assumptions about the distribution of the population Distributions of samples of chemical systems display trends of well-defined population distributions. What are they? 9 CHE322_F06 D.3 CHAPTER4_EAD PROBABILITY DISTRIBUTION Distribution of a population: Frequency of occurrence versus individual values Distribution of data where the members of the population can take any value, i .e. continuous distribution. Example: Use data obtained for the calibration of a 10-mL pipet Generate a histogram of the data Calculate the mean and the standard deviation What is the shape/ trend of the distribution around the 'central value'? Can you predict the distribution of the population from this sample's distribution? 10 CHE322_F06 CHAPTER4_EAD D.3.21 NORMAL/ GAUSSIAN DISTRIBUTION Members of the population may take any value We will first discuss Gaussian statistics of populations; then we will show how these relationships can be modified and applied to small samples of data. Gaussian Distribution Equation f(X) versus X X 2 exp 2 2 2 2 1 f (X ) f (X ): frequency of occurrence for a value X Defined by two Parameters only: in1 X i 2 true mean n in1 X i 2 n true population's variance Properties of a normal distribution 1) The mean occurs at the point of maximum frequency 2) There is a symmetrical distribution of positive and negative deviation about maximum 3) There is an exponential decrease in frequency as the magnitude of deviations increases 11 CHE322_F06 CHAPTER4_EAD Universal Gaussian curve Frequency of deviations from the mean versus deviation from the mean in X units of standard deviation ( z ) When X , z 0 Appendix 1A: z deviations versus fraction of population to the right of z Area under two limits gives the probability of occurrence between the two limits Limits % population 1 68.26 2 95.44 3 99.73 4 99.99 f ( X )dX X 2 exp dX 2 2 2 2 1 Let us set = 0 and = 1 1 1 X 2 1 exp dX 2 2 Confidence interval For X i taken from the population, we can state that: X i z 12 CHE322_F06 CHAPTER4_EAD Confidence Intervals for various values of z z The probability of finding within z Z Confidence Interval X (%) 0.5 38 0.5 1.00 68.26 1.0 1.50 86.64 1.50 1.96 95.00 1.96 2.50 98.76 2.50 3.00 99.73 3.00 3.50 99.95 3.50 13 CHE322_F06 CHAPTER4_EAD D.3.2 What if a mean is obtained from a sample of the population of known standard deviation? Confidence Intervals in cases of a sample of measurements (n) and known population's X z n Examples 14 CHE322_F06 CHAPTER4_EAD D.3.3 PROBABILITY DISTRIBUTIONS FOR SAMPLES In experimental sciences, we seldom know the parameters of the population. Therefore we must make assumptions about the population distribution or predict the distribution. Measurements on a large sample can be used to verify the distribution trend. Let us do that using data on the calibration of a pipet Replicate data on the Calibration of a 10-mL Pipet a) Construct histogram b) Calculate the mean and the standard deviation Central limit theorem The distribution of measurements is normal when all errors are random, independent of each other and of similar magnitude. Then, the sample mean is a good estimate of the population mean, and the sample variance is a good estimate of the population variance. Therefore, we can c) Generate a Gaussian curve using the mean and the standard deviation calculated Estimating the true mean () and the true standard deviation () Analysis of a large number of samples will yield the true mean and standard deviation. When the sample size is 50 (>20) the sample mean and the sample standard deviation approach and respectively. 15 CHE322_F06 CHAPTER4_EAD Confidence intervals As we have assumed Gaussian distribution of the population, we can determine a range within which the true mean is expected at a given confidence level. Can we use z to define intervals? Recall z was calculated using population parameters z ( X ) s instead of z and n So we use t and s n Xi t t z at all confidence levels s s m : Standard error of the mean n s: Sample standard deviation s in1 X i X 2 n-1 s n 1 degrees of freedom (df, ); is the number of independent results used to compute the standard deviation (when n-1 deviation have been computed, the final one is known) in1 2 n X i X 2 i 1 i n 1 n 16 CHE322_F06 CHAPTER4_EAD Appendix 1B lists values of t for various confidence levels and degrees of freedom. How should t vary with sample size? When n = 50 , t 95 = 2.01 For population (n = ), t 95 = 1.96 Example Use Pipet data. What is the 95 % confidence interval for the pipet data? Mean volume = 9.982 mL Standard deviation = 0.0056 mL Number of trials = 50 = 49 9,982 2.01 0.0056 9.982 0.0016 mL 49 There is 95 % probability that the pipet's mean volume is between 9.984 and 9.980 mL. 17 CHE322_F06 E. CHAPTER4_EAD Statistical Analysis of Data We can make definite statements only about the probability that the true value lies within a given range. Q: How do we compare two or more samples of results, or two or more analysts results, or results obtained from two or more methods, made during a long period of time, from different sources/ subjects? R: Use statistical tests to determine if the results are significantly different or not at a desired confidence level. Note that there still remains the probability that the response may be wrong, because our hypothesis is tested statistically. E.1 SIGNIFICANCE TESTING/ hypothesis testing Construct probability distribution curves for each sample of measurements Use figure on page 82 Q: Can the difference between the samples be explained by indeterminate error? R: One can only determine the probability that the difference is significant Null hypothesis: E.2 assumes that the numerical quantities being compared are equal TEST OF SIGNIFICANCE FOR MEANS Sample mean and population mean Null hypothesis (H0): the mean of the sample is equal to the mean of the population Alternative hypothesis (HA): the mean of the sample is not equal to the mean of the population 18 CHE322_F06 CHAPTER4_EAD Choose a significance level: 95 %: the probability that H0 will be correctly retained The probability that H0 will be incorrectly rejected is = 0.05 Confidence interval 1 100 19 CHE322_F06 CHAPTER4_EAD Example A new procedure for the rapid determination of sulfur in kerosenes was tested on a sample known from its method of preparation to contain 0.123 % S ( ). The results were %S = .112, 0.118, 0.115, and 0.119. Do the data indicate there is bias in the method? Xi 0.464 X 0.116 % S X i2 0.053854 s = 0.0032 X 0.007% Compute t exp and compare it to critical t at the desired confidence level X t exp t exp t exp s n X n s 0.123 0.116 4 t ( , ) 3.18 0.003 4.375 the null hypothesis must be rejected 0.05 3 The probability of rejecting the null hypothesis incorrectly is 0.05. Type 1 error: null hypothesis is incorrectly rejected Type 2 error: null hypothesis is incorrectly retained 20 CHE322_F06 E.3 CHAPTER4_EAD TEST OF SIGNIFICANCE FOR STANDARD DEVIATIONS A) Are analysis results within statistical control? Can the difference between the standard deviation of the sample and the population standard deviation be explained by random error? Null hypothesis: s 2 2 F-test Fexp s2 2 s2 2 F( , ( num), ( den)) If Fexp Fcrit reject the null hypothesis B) Are two variances of two samples significantly different? Fexp s 2A s B2 21 CHE322_F06 E.4 CHAPTER4_EAD COMPARING TWO EXPERIMENTAL MEANS A) Unpaired data: samples are from the same source Compare the mean of two sets of identical analysis t exp XA XB s 2A n A s B2 n B (1) If the standard deviations are not significantly different use the pooled standard deviation and equation (2) to calculate t. t exp XA XB s pool s pool 1 n A 1 n B n A 1s 2A n B 1s B2 n A nB 2 (2) (3) n A nB 2 t ( , ) ? If the standard deviation are significantly different use equation (1) to calculate t exp . Calculate degrees of freedoms using equation (4) and round to the nearest integer. 2 2 s 2A n A s B n B 2 2 2 2 2 s A n A n A 1 s B n B n B 1 22 (4) CHE322_F06 CHAPTER4_EAD B) Paired data: samples are from different sources t exp d n sd di : difference between paired data sd : standard deviation of the differences d: average difference 23 CHE322_F06 E.5 CHAPTER4_EAD DETECTING GROSS ERRORS: OUTLIERS TEST/ Q-TEST Should a measurement be rejected? A) Outlier is the smallest value ( X 1 ), Qexp X 2 X1 X n X1 B) Outlier is the largest value ( X n ) X X n 1 Qexp n X n X1 Appendix 1D: Q( , n) Caution, when the sample is small such as the three to five determinations you make in the CHE 322 L laboratory course. "Those who believe that they can discard observations with statistical sanction by using statistical rules for rejection of outliers are simply deluding themselves." J Mandel 24 CHE322_F06 CHAPTER4_EAD F. DETECTION LIMITS F.1 IUPAC Definition The detection limit is the smallest concentration or absolute amount of analyte that has a signal significantly larger than the signal arising from a reagent blank. (Detectable signal) This limit is determined by the blank signal / 'background noise' of the method and the sensitivity of the method. H0: no analyte in blank S A DL S reag z reag S A DL S reag ts reag reag : known standard deviation for reagent blank's signal s reag : standard deviation determined for a reagent blank's signal t: for one-tailed analysis C A DL S A DL C A DL S A DL z3 k k ( = 0.00135) The probability of type 1 error is .135 %, but the probability of type 2 error is higher. 25 CHE322_F06 F.2 CHAPTER4_EAD Limit of Identification (LOI) S A LOI S reag z reag z samp LOI: the smallest concentration or absolute amount of analyte such that the probability of type 1 and type 2 errors are equal. F.3 Limit of Quantitation (LOQ) Committee on Environmental Chemistry: LOQ is the smallest concentration or absolute amount of analyte that can be reliably determined. (Quantifiable signal) S A LOQ S reag 10 reag 26