Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ANALYTICAL PROPERTIES PART II ERT 207 ANALYTICAL CHEMISTRY SEMESTER 1, ACADEMIC SESSION 2015/16 Overview 2 CONFIDENCE INTERVALS STUDENT’S T / T STATISTICS STATISTICS AIDS TO HYPOTHESIS TESTING COMPARISON OF TWO EXPERIMENTAL MEANS ERRORS IN HYPOTHESIS TESTING COMPARISON OF VARIANCES ANALYSIS OF VARIANCE bblee@unimap CONFIDENCE INTERVALS 3 The confidence interval for the mean is the range of values within which the population mean (μ) is expected to lie with a certain probability. Sometimes the limits of the interval are called confidence limits. The size of the confidence interval, which is computed from the sample standard deviation, depends on how well the sample standard deviation (s) estimates the population standard deviation (σ). bblee@unimap CONFIDENCE INTERVALS 4 Figure 1 shows a series of five normal error curves. In each, the relative frequency is plotted as a function of the quantity z, which is the deviation from the mean divided by the population standard deviation. The numbers within the shaded areas are the percentage of the total area under the curve that is included within these values of z. bblee@unimap CONFIDENCE INTERVALS 5 (a) (d) bblee@unimap (b) (e) (c) Figure 1: Areas under a Gaussian curve for various values of ±z. CONFIDENCE INTERVALS 6 From Figure 1 (a): 50% of the area under any Gaussian curve is located between -0.67σ and +0.67σ. We may assume 50 times out of 100 the true mean μ will fall in the interval of x ± 0.67σ. Confidence level: The probability that the true mean lies within a certain interval It is often expressed as a percentage. bblee@unimap CONFIDENCE INTERVALS 7 Figure 1 (a): The confidence level is 50% and the confidence interval is from -0.67σ to +0.67σ. Significance level: the probability that a result is outside the confidence interval. A general expression for the confidence interval (CI) of the true mean based on measuring single value x: CI for μ = x ± z σ bblee@unimap CONFIDENCE INTERVALS 8 For the experimental mean of N measurements: zσ CI for μ x N Table 1 shows the values of z at various confidence level. The relative size of the confidence interval as a function of N is shown in Table 2. bblee@unimap CONFIDENCE INTERVALS 9 Table 1: bblee@unimap Table 2: CONFIDENCE INTERVALS 10 EXAMPLE 1: Determine the 80% and 95% confidence intervals for: (a) A data entry of 1108 mg/L glucose (b) A mean value for 1 week data of 1100.3 mg/L (1 data is recorded per day). Assume that in each part, s = 19 is a good estimate of σ. bblee@unimap STUDENT’S T / T STATISTICS 11 The t statistics is often called Student’s t. To account for the variability of s, we use the important statistical parameter t, which is defined in exactly the same way as z except that s is substituted for σ. xμ For a single measurement with result x, t For the mean of N measurement, bblee@unimap xμ t s N s STUDENT’S T / T STATISTICS 12 The confidence interval for the mean of N replicate measurements can be calculated from t, ts CI for μ x N bblee@unimap STUDENT’S T / T STATISTICS 13 Table 3: bblee@unimap STUDENT’S T / T STATISTICS Example 2: A clinical chemist obtained the following data for the alcohol content of a sample of blood: % C2H5OH: 0.084, 0.089, and 0.079. Calculate the 95% confidence interval for the mean assuming that (a) The three results obtained are the only indication of the precision of the method (b) From previous of experience on hundreds of samples, we know that the standard deviation the method s = 0.005% C2H5OH is a good estimate of σ. 14 bblee@unimap STATISTICS AIDS TO HYPOTHESIS TESTING 15 Hypothesis testing is the basis for many decision made in science and engineering. The hypothesis tests that we describe are used to determine if the results from these experiments support the model. If agreement is found, the hypothetical model serves as the basis for further experiments. When the hypothesis is supported by sufficient experimental data, it becomes recognized as a useful theory until such time as data are obtained that prove it. bblee@unimap STATISTICS AIDS TO HYPOTHESIS TESTING 16 A null hypothesis postulates that two or more observed quantities are the same. Specific examples of hypothesis tests that scientists often use include the comparison of (1) The mean of an experimental data set with what is believed to be the true value, (2) The mean to a predicted or cutoff (threshold) value, (3) The means or the standard deviations from two or more sets of data. bblee@unimap STATISTICS AIDS TO HYPOTHESIS TESTING 17 Comparing an experimental mean with a known value: A statistical hypothesis test to draw conclusions about the population mean (μ) and its nearness to the known value (μ0). There are two contradictory outcomes that we consider in any hypothesis test: (1)The null hypothesis H0, states that μ = μ0. (2)The alternative hypothesis Ha, bblee@unimap STATISTICS AIDS TO HYPOTHESIS TESTING 18 We might reject the null hypothesis in favor of Ha if is different than μ0 (μ ≠ μ0). Other alternative hypotheses are μ > μ0 or μ < μ0. bblee@unimap STATISTICS AIDS TO HYPOTHESIS TESTING 19 Suppose we are interested in determining whether the concentration of lead in an industrial wastewater discharge exceeds the maximum permissible amount of 0.05 ppm. Our hypothesis test would be summarized: H0: μ = 0.05 ppm Ha: μ > 0.05 ppm bblee@unimap STATISTICS AIDS TO HYPOTHESIS TESTING 20 Large Sample Z test: If a large number of results are available so that s is a good estimate of σ, the z test is appropriate. 1. State the null hypothesis: H0: μ = μ0 x μ0 2. Form the test statistic: z σ N 3. State the alternative hypothesis Ha and determine the rejection region. bblee@unimap STATISTICS AIDS TO HYPOTHESIS TESTING For Ha: μ ≠ μ0, reject H0 if z ≥ zcrit or if z ≤ -zcrit (two-tailed test) For Ha: μ > μ0, reject H0 if z ≥ zcrit (one-tailed test) For Ha: μ < μ0, reject H0 if z ≤ -zcrit (one-tailed test) 21 Figure 2 (a): There is only a 5% probability that random error will lead to a value of z ≥ zcrit or z ≤ -zcrit. The significance level overall is α = 0.05 From Table 1, the critical value of z is 1.96 bblee@unimap STATISTICS AIDS TO HYPOTHESIS Figure 2: Rejection TESTING 22 bblee@unimap regions for the 95% confidence level (a) Two-tailed test for Ha: μ≠ μ0. STATISTICS AIDS TO HYPOTHESIS TESTING 23 Figure 2: Rejection regions for the 95% confidence level (c) One-tailed test for Ha: μ< μ0. Figure 2 (b): The probability that z exceeds zcrit to be 5% or the total probability in both tails to be 10%. The significance level overall is α = 0.10. The critical value from Table 1 is 1.64. bblee@unimap STATISTICS AIDS TO HYPOTHESIS TESTING Example 3 A class of 30 students determined the activation energy of a chemical reaction to be 116 kJ/mol (mean value) and standard deviation of 22 kJ/mol. Are the data in agreement with the literature value of 129 kJ/mol at (a) The 95% confidence level (b) The 99% confidence level Estimate the probability of obtaining a mean equal to the student value. bblee@unimap 24 STATISTICS AIDS TO HYPOTHESIS TESTING For a small number of results, we use a similar procedure to the z test except that the test statistics is the t statistic. The null hypothesis H0: μ= μ0, where μ0 is a specific value of μ such as an accepted value, a theoretical value or a threshold value. 1. State the null hypothesis: H0: μ = μ0 x μ0 2. From the test statistic: 25 t s N 3. State the alternative hypothesis Ha and determine the rejection region. bblee@unimap STATISTICS AIDS TO HYPOTHESIS TESTING For Ha: μ ≠ μ0, reject H0 if t ≥ tcrit or if t ≤ -tcrit (two-tailed test) For Ha: μ > μ0, reject H0 if t ≥ tcrit (one-tailed test) For Ha: μ < μ0, reject H0 if t ≤ -tcrit (one-tailed test) 26 Figure 3: Curve A: If the analytical method had no systematical error, or bias, random errors would give the frequency distribution. bblee@unimap STATISTICS AIDS TO HYPOTHESIS TESTING 27 Figure 3: Curve B: The frequency distribution of results by a method that could have a significant bias due to a systematic error. Figure 3: Illustration of systematic error in an analytical method. bblee@unimap STATISTICS AIDS TO HYPOTHESIS TESTING Example 4: A new procedure for the rapid determination of sulfur in kerosenes was tested on a sample known from its method of preparation to contain 0.123% S (μ0=0.123%S). The results for %S were 0.112, 0.118, 0.115 and 0.119. Do the data indicate that there is a bias in the method at the 95% confidence level? 28 bblee@unimap COMPARISON OF TWO EXPERIMENTAL MEANS Frequently scientists must judge whether a difference in the means of two sets of data is real or the result of random error. c The t-Test for differences in means: The test statistics t is could be found from: 29 x1 x2 t N1 N 2 s pooled N1 N 2 bblee@unimap COMPARISON OF TWO EXPERIMENTAL MEANS If there is good reason to believe that the standard deviations of the two data sets differ, the two-sample t test must be used. Paired data: Scientists and engineers often make use of pairs of measurements on the same sample in order to minimize sources of variability that are not of interest. Specific difference (0) μd Δ 0 The test statistic value: 30 Average difference d bblee@unimap N i d Δ0 t sd N μd Δ 0 μd Δ 0 COMPARISON OF TWO EXPERIMENTAL MEANS Example 5: A new automated procedure for determining glucose in serum (Method A) is to be compared to the established method (Method B). Both methods are performed on serum from the same six patients in order to eliminate patient-topatient variability. Do the following results confirm a difference in the two methods at the 95% confidence level? 31 bblee@unimap ERRORS IN HYPOTHESIS TESTING 32 Type I error: A type 1 error occurs when H0 is rejected although it is actually true. In some sciences, a type I error is called a false negative. Type II error: A type II error occurs when H0 is accepted and it is actually false. It is sometimes termed a false positive. bblee@unimap ERRORS IN HYPOTHESIS TESTING The consequences of making errors in hypothesis testing are often compared to the errors made in judicial procedures. Convicting an innocent person is usually considered a more serious error than setting a guilty person free. If we make it less likely that an innocent person gets convicted, we make it more likely that a guilty person goes free. It is important when thinking about errors in hypothesis testing to determine the consequences of making a type I or type II error. 33 bblee@unimap ERRORS IN HYPOTHESIS TESTING 34 As a general rule of thumb, the largest α that is tolerable for the situation should be used. This ensures the smallest type II error while keeping the type I error within acceptable limits. For many cases in analytical chemistry, an α value of 0.05 (95% confidence level) provides an acceptable compromise. bblee@unimap COMPARISON OF VARIANCES At times, there is a need to compare the variances (or standard deviation) of two data sets. The normal t-test requires that the standard deviations of the data sets being compared are equal. F-test: A simple statistical test can be used to test this assumption under the provision that the populations follow the normal (Gaussian) distribution. 35 bblee@unimap COMPARISON OF VARIANCES 36 F-test is based on the null hypothesis that the two population variances under consideration are equal. 2 2 H 0 : σ1 σ 2 The test statistic F, which is defined as the ratio of 2 s1 the two samples variances. F s 2 2 It is calculated and compared with the critical value of F at the desired significance level. The null hypothesis is rejected if the test statistic differs too much from unity. bblee@unimap COMPARISON OF VARIANCES F-test is used in comparing > two means and in linear regression analysis. Critical values of F at the 0.05 significant level are shown in Table 4. 37 Table 4: bblee@unimap COMPARISON OF VARIANCES Two degrees of freedom are given, one associated with the numerator and the other with denominator. The F-test can be used in either a one-tailed mode or in a two-tailed mode. 38 bblee@unimap COMPARISON OF VARIANCES Example 6 A standard method for the determination of the carbon monoxide (CO) level in gaseous mixtures is known from many hundreds of measurements to have a standard deviation of 0.21 ppm CO. A modification of the method yields a value for s of 0.15 ppm CO for a pooled data set with 12 degrees of freedom. A second modification, also based on 12 degrees of freedom, has a standard deviation of 0.12 ppm CO. Is either modification significantly more precise than the original? 39 bblee@unimap ANALYSIS OF VARIANCE 40 ANOVA – the methods used for multiple comparisons fall under the general category of analysis of variance. ANOVA indicates a potential difference, multiple comparison procedures can be used to identify which specific population means differ from the others. Experimental design methods take advantages of ANOVA planning and performing experiments. bblee@unimap ANALYSIS OF VARIANCE ANOVA detects difference in several population means by comparing the variances. The following are typical applications of ANOVA: 1. Is there a difference in the results of five analysts determining calcium by a volumetric method? 2. Will four different solvent compositions have differing influences on the yield of a chemical synthesis? 3. Are the results of manganese determination by three different analytical method different? 4. Is there any difference in the fluorescence of a complex ion at six different values of pH? 41 bblee@unimap ANALYSIS OF VARIANCE 42 Figure 4 – a single factor, or one-way ANOVA. The basic principle of ANOVA is to compare the variation between the different factor levels (groups) to that within factor levels. The groups are the different analysts, a comparison of the variation between analysts to the within-analyst variation (Figure 5). bblee@unimap ANALYSIS OF VARIANCE 43 Figure 4 bblee@unimap ANALYSIS OF VARIANCE 44 Figure 5 bblee@unimap ANALYSIS OF VARIANCE 45 ANOVA Table: bblee@unimap ANALYSIS OF VARIANCE Example 7: Five analysts determined calcium by a volumetric method and obtained the amount (in mmol Ca) shown in the table below. Do the means differ significantly at the 95% confidence level? 46 bblee@unimap EXAMPLE 1 (a) From Table 1, z = 1.28 & 1.96 for 80% and 95% confidence levels. 80% CI = 1108 ± 1.28 x 19 = 1108 ± 24.3 mg/L 95% CI = 1108 ± 1.96 x 19 = 1108 ± 37.2 mg/L 47 bblee@unimap EXAMPLE 1 It can be concluded that 80% probable that the population mean (μ) lies in the interval 1083.7 to 1132.3 mg/L glucose. The probability is 95% that μ lies in the interval between 1070.8 and 1145.2 mg/L. 48 bblee@unimap EXAMPLE 1 (b) For the seven measurements, 1.28 x19 80% CI = 1100.3 7 = 1100.3 ± 9.2 mg/L 1.96 x19 95% CI = 1100.3 7 = 1100.3 ± 14.1 mg/L 49 bblee@unimap EXAMPLE 1 The experimental mean (Ẋ = 1100.3 mg/L), it can be concluded that there is an 80% chance that μ is located in the interval between 1091.1 and 1109.5 mg/L glucose and a 95% chance that it lies between 1086.2 and 1114.4 mg/L glucose. Note: the intervals are considerably smaller when we use the experimental mean instead of a single value. 50 bblee@unimap EXAMPLE 2 (a) = 0.252 x 0 . 084 0 . 089 0 . 079 i x 2 i 0.007056 0.007921 0.006241 = 0.021218 0.021218 0.252 3 s 3 1 2 = 0.0050% C2H5OH In this, x 0.252 3 = 0.084 51 bblee@unimap EXAMPLE 2 t = 4.30 for two degrees of freedom & the 95% confidence level. ts 4.30 x0.0050 95% CI x 0.084 N 3 = 0.084 ± 0.012%C2H5OH (b) Because s = 0.0050% is a good estimate of σ, we can use z, zσ 1.96 x0.0050 95% CI x 0.084 N 3 = 0.084 ± 0.006%C2H5OH 52 bblee@unimap EXAMPLE 3 μ0 is the literature value of 129 kJ/mol so that the null hypothesis is μ = 129 kJ/mol. The alternative hypothesis is that μ ≠ 129 kJ/mol. This is a two-tailed test. From Table 1, zcrit = 1.96 for the 95% confidence level, and zcrit = 2.58 for the 99% confidence level. The test statistic is calculated as: x μ0 116 129 z σ N 22 30 = - 3.27 53 bblee@unimap EXAMPLE 3 Since z ≤ -1.96, we reject the null hypothesis at the 95% confidence level. Since z ≤ -2.58, we also reject H0 at the 99% confidence level. In order to estimate the probability of obtaining a mean value μ = 116 kJ/mol, the probability of obtaining a z value of 3.27. Table 1, the probability of obtaining a z value this large because of random error is only about 0.2%. 54 bblee@unimap EXAMPLE 3 All of these results lead us to conclude that the student mean is actually different from the literature value and not just the result of random error. 55 bblee@unimap EXAMPLE 4 The null hypothesis is H0: μ = 0.123% The alternative hypothesis is Ha: μ ≠ S, 0.123%S. x i 0.112 0.118 0.115 0.119 = 0.464 x 0.464 4 = 0.116%S 2 x i 0.012544 0.13924 0.013225 0.014161 = 0.53854 0.053854 ( 0.464 )2 4 0.000030 s 4 1 3 = 0.0032% S 56 bblee@unimap EXAMPLE 4 The test statistic can be calculated as, x μ0 0.116 0.123 t = -4.375 s N 0.0032 4 From Table 3: The critical value of t for 3 degrees of freedom and the 95% confidence level is 3.18. Since t ≤ -3.18, we conclude that there is a significant difference at the 95% confidence level and thus bias in the method. 57 bblee@unimap EXAMPLE 4 If we were to do this test at 99% confidence level, tcrit = 5.84. Since t =-4.375 is greater than -5.84, we would accept the null hypothesis at the 99% confidence level and conclude there is no difference between the experimental and the accepted values. 58 bblee@unimap EXAMPLE 5 If μd is the true average difference between the methods, we want to test the null hypothesis: H0: μd = 0 H a : μd ≠ 0 d 0 The t-test statistic is: t sd N N = 6, ∑di = 16+9+25+5+22+11=88 ∑di2 =1592, ḋ = 88/6 = 14.67 59 bblee@unimap EXAMPLE 5 The standard deviation of the 2 difference: 88 1592 6 = 7.76 s d The t-statistic: 6 1 14.67 t = 4.628 7.76 6 The critical value of t = 2.57 for the 95% confidence level and 5 degrees of freedom. 60 bblee@unimap EXAMPLE 5 Since t>tcrit, we reject the null hypothesis and conclude that the two methods give different results. 61 bblee@unimap EXAMPLE 6 Null hypothesis: H 0 :σ 2 std σ 2 1 The variance of the The variance of the standard method modified method The alternative hypothesis: Ha : σ σ 2 1 2 std Because an improvement is claimed, the variances of the modifications are placed in the denominator. 62 bblee@unimap EXAMPLE 6 For For For 1st s 0.21 modification: F1 2 s 0.15 2 2 std 2 2 = 1.96 0.21 = 3.06 F2 2 0.12 2nd modification: 2 the standard procedure, sstd is a good estimate of, and the number of degrees of freedom from the numerator can be taken as infinite. Fcrit 2.30 bblee@unimap 63 EXAMPLE 6 F1 < 2.30, We cannot reject the null hypothesis & conclude that there is no improvement in precision. F2 > 2.30, We reject the null hypothesis and conclude that the second modification does appear to give better precision at the 95% confidence level. • Is the precision of the 2nd modification is significantly better the 1st ? 64 bblee@unimap EXAMPLE 6 The F-test dictates that we must accept 2 2 the null hypothesis, s 0.15 F 1 2 2 s 2 0.12 = 1.56 In this case, Fcrit = 2.69. Since F < 2.69, we must accept H0 and conclude that the two methods give equivalent precision. 65 bblee@unimap EXAMPLE 7 We obtain the mean and standard deviations for each analyst. The mean for analyst 1 is x1 10.3 9.8 11.4 3 = 10.5 mmol Ca The remaining means are obtained in the same manner: 66 bblee@unimap EXAMPLE 7 The results are summarized as follows, The grand mean is found: 67 bblee@unimap EXAMPLE 7 68 bblee@unimap EXAMPLE 7 The F table, the critical value of F at the 95% confidence level for 4 and 10 degrees of freedom is 3.48. Since F exceeds 3.48, we reject H0 at the 95% confidence level and conclude that there is a significant difference among the analysts. 69 bblee@unimap