Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Tests in the normal distribution This text provides an overview of the most important statistical tests: Test no. 1 – test of mean, variance known. Test no. 2 – test of mean, variance unknown. Test no. 3 – paired observations. Test no. 4 – test of equality of two means, same variance in the groups. 5. Test no. 5 – test of equality of two means, different variance in the groups. 6. Test no. 6 – test of a given variance (standard deviation). 7. Test no. 7 – test of equality of two variances (standard deviations). 1. 2. 3. 4. The tools are general statistical tests, which can easily be performed by means of Microsoft Excel. The purpose of these techniques is to perform different statistical tests concerning mean and variance in one or two groups of observations for normally distributed data. It is assumed that the reader is capable of calculating mean and standard deviation of a group of observations. This can conveniently be done in Excel by means of the functions AVERAGE and STDDEV respectively. As a matter of form the equations are stated here. Sometimes the notion “mean” (often depicted is employed for a “true” value, while “average” is employed for the estimated value. Equally there is sometimes distinction between the concepts “population standard deviation” (often depicted ) for the true value and “sample standard deviation” for the estimated value. Often though, the notions will be employed indiscriminately. We have a group of n independent and normally distributed observations x1 to xn. The average (the “mean”) is given by the equation x 1 n x n i1 i The variance is given by the equation 1 n s ( xi x )2 n 1 i 1 2 and the sample standard deviation (the “standard deviation”) is s, i.e. the square root of the variance. In the remainder of this section detailed equations will not be given everywhere. Instead you are referred to the technical appendix ”Technical summary” at the end of this note. 1 Test no. 1 – test of mean, variance known. Introduction This test is used in the following situation: We have a group of n independent and normally distributed observations x1 to xn, cf. above. Their standard deviation is supposed known, thus this applies to the variance too. The mean however is unknown. We will test whether the mean can be assumed equal to a value 0 given in advance. We will therefore not undertake any calculation of the standard deviation, but we estimate the mean by the average of all the observations, cf. above. If it is doubtful whether the given standard deviation can still be used, test no. 6 might be employed. Example x1 4.6 x2 8.5 x3 4.9 n x 4 5.825 x4 5.3 0 x -0 5 0.825 z0 1.5 1.100 In this example there are 4 observations, i.e. n = 4. We know from many earlier analyses that the sample standard deviation (the standard deviation) can be assumed to be constant = 1.5. The mean () is unknown, but is estimated by the average x = 5.825. Our purpose is to test whether this differs from a target-value of 0 = 5.0. Calculation of the test statistic Our average x = 5.825 has to be compared to the target-value 5.0. Numerically large deviations (positive or negative) will imply that we reject the assumption of a mean = 5.0. The deviation between the average and the target should of course be compared to the standard deviation, i.e. =1.5. If there is a large standard deviation it takes (numerically) larger deviations to make us reject the assumption of a mean = 5.0. It is a fact that the more observations available, the more precisely the average is determined. If e.g. 4 times as many observations get available, the variance must be divided by 4, i.e. the standard deviation must be divided by 2. Generally the following rule applies: The standard deviation of a mean is obtained by dividing the original standard deviation by the square root of the number of observations. This results in the use of the following “test statistic”: z0 x 0 / n 2 In the example above z0 =1.100. The question is then how large a value (numerically) of z0 will imply that we reject the assumption of a mean = 5.0. Distribution of the test statistic This question is related to the distribution of the test statistic. In the current case, it can be shown that the test statistic is characterized by a “standard” normal distribution (i.e. mean 0 and standard deviation 1). This distribution has the advantage that it has been tabulated. It is only necessary to look up the (standard) normal distribution in a table to find out how large (numerical) values of z0 that will imply the rejection of the assumption of a mean = 5.0. The answer to this question is related to the choice of level of significance. Most often we choose the 5% level of significance, but sometimes we choose the 1% level of significance. These two levels of significance are those recommended in different ISO-standards. If we choose a level of significance of 5%, (numerical) values above 1.96 are critical. This is because 95% of all observations in a standard normal distribution are within –1.96 and 1.96. Technically you call –1.96 the 2.5% fractile and 1.96 the 97.5% fractile, and these values are found in the table. If we choose a level of significance of 1%, (numerical) values above 2.576 are critical, because 99% of all observations in a standard normal distribution are within –2.576 (which is the 0.5% fractile) and 2.576 (which is the 99.5% fractile). Conclusion We have in the example found a test statistic of 1.100. This is well below 1.96, for which reason there is no statistical evidence for rejection of the assumption that the mean equals the target-value, and this assumption must thus be accepted (on the 5% level, and of course also on the 1% level). Plainly speaking: The found average value 5.825 does not differ markedly from the accepted target-value 5.0. One-sided test On rare occasions it occurs that you know in advance that it is impossible to reach an average below target. In these cases a so-called One-sided test is employed. In the example it is done like this: We have an average of 5.825, which is above 5.0 as expected (if it was not, our prior knowledge would be wrong!). In a One-sided test you must (for tests on the 5% level) compare the test statistic 1.100 with the 95% fractile instead of the 97.5% fractile. In a table this is read to be 1.645. Our assumption that the average equals target is still accepted. 3 For a test on the 1% level the comparison is made to the 99% fractile, which is 2.326. If our prior knowledge on the contrary is that it is impossible to reach an average above target, then values below -1.645 (and –2.326 respectively) are critical. Test no. 2 – test of mean, variance unknown. Introduction This test is employed in the following situation: We have a group of n independent and normally distributed observations x1 to xn, cf. above. Their standard deviation is supposed unknown, thus this applies to the variance too. The mean is equally unknown. We will test whether the mean can be assumed equal to a value 0 given in advance. In this test we will estimate both the standard deviation (with the sample standard deviation) and the mean (with the average). Example Same data as in test no. 1. x1 4.6 x2 8.5 x3 4.9 n x 4 5.825 x4 5.3 s t0 0 x -0 5 0.825 1.806 0.914 In this example there are 4 observations, i.e. n = 4. The standard deviation is now estimated by the sample standard deviation s =1.806. The mean is estimated by the average x = 5.825. Our purpose is to test whether this differs from a target value of 0 = 5.0. Calculation of the test statistic Compared to test no.1 the difference is that we employ the sample standard deviation s as a measure of the standard deviation, because we do not have a value known in advance. This implies the use of the following test statistic, which we now call t0 : t0 x 0 s/ n In the example above t0 = 0.914. The question is how large a value (numerically) of t0 will imply that we reject the assumption of a mean = 5.0. Distribution of the test statistic 4 Now the test statistic is no longer characterized by a (standard) normal distribution. When there is a large number of observations (e.g. more than 40) it will be a good approximation, but when there are few observations this is not the case. Instead of the normal distribution a “Student’s” t-distribution must be employed. This is not one but a whole family of distributions. If there are n observations (at least 2), a (Student’s) t-distribution with n-1 “degrees of freedom” is employed. In the example there are 4 observations, i.e. the number of degrees of freedom is 3. If we choose a level of significance of 5%, the critical limit is found to be 3.182 (the 97.5% fractile) in the table. If we choose a level of significance of 1%, the critical limit is found to be 5.841 (the 99.5% fractile) in the table. Conclusion In the example we have found a test statistic of 0.914. This is well below 3.182, for which reason there is no statistical evidence for rejection of the assumption that the mean equals the target-value, and thus this assumption must be accepted (on the 5% level, and of course also on the 1% level). One-sided test If we know in advance that it is impossible to reach an average below target, a Onesided test is employed: In the example we have an average of 5.825, which is above 5.0 as expected. In a One-sided test you must (for tests on the 5% level) compare the test statistic 0.914 with the 95% fractile instead of the 97.5% fractile. In a table of the tdistribution (with 3 degrees of freedom) this is read to be 2.353. Our assumption that the average equals target is still accepted. For a test on the 1% level the comparison is made to the 99% fractile, which is 4.541. Test no. 3 – paired observations. Introduction The situation could be the following: We have n samples, which have each been analysed by two laboratories. We are interested in examining whether there are differences between the two laboratories, whereas differences between the samples are not of interest on this occasion. This problem can be handled by calculating the difference between the results of the two laboratories, because it is only the difference between the laboratories that is of interest. The individual results of each laboratory are not relevant. 5 Example Laboratory 1 4.6 8.5 4.9 5.3 Laboratory 2 5.0 9.0 5.0 6.0 n d Difference 0.4 0.5 0.1 0.7 4 0.4 sd t0 0.25 3.4 We have 2 laboratories, and analyses of 4 samples have been performed. In the first row the results of laboratory 1 are shown and in the second row the results of laboratory 2. The difference between the results (Laboratory 2 – Laboratory 1) is shown in the third row, together with the relevant calculations. The average of the differences is 0.4. We will test whether the laboratories are identical, i.e. whether the mean of the differences can be assumed to be 0. In comparison to test no. 2, 0 is similar to the target mean. Calculation of the test statistic The test is similar to test no. 2, as the standard deviation of the differences will be unknown in practice. For this reason we can set up the test statistic, t0 d sd / n In the example we obtain t0 = 3.4. Distribution of the test statistic As in test no. 2 this test statistic is characterized by a (Student’s) t-distribution. The number of degrees of freedom is here also n-1, because we have n differences. In the example there are 4 differences, i.e. the number of degrees of freedom is 3. The critical values are (as in the example in test no. 2) 3.182 (the 97.5% fractile, level of significance of 5%) and 5.841 (the 99.5% fractile, level of significance of 1%) respectively. Conclusion In the example we have obtained a test statistic of 3.4. This is above 3.182, for which reason we must reject the assumption of similarity of the laboratories (on the 5% level). On the contrary the test statistic 3.4 is below 5.841. On the 1% level of significance we will accept the assumption that the laboratories are identical. One-sided test If we know in advance (e.g. for professional reasons) that laboratory 2 always provides results larger than those of laboratory 1, a One-sided test is employed: 6 Critical values are obtained (as in the example for test no. 2) to be 2.353 (the 5% level, the 95% fractile) and 4.541 (the 1% level, the 99% fractile) respectively. The conclusion is here the same as for a two-tailed test: On the 5% level we reject while we on the 1% level accept the assumption that the laboratories are identical. Test no. 4 – test of equality of two means, same variance in the groups. Introduction This test is employed in the following situation: We have two groups of independent and normally distributed observations. Their standard deviation is assumed unknown, but common to both groups. The mean of both groups is unknown. We will test whether the means of the two groups can be assumed equal. In this test we will estimate the mean of both groups separately and also the common standard deviation. If doubt arises whether it is correct to assume that the two groups have the same standard deviation, test no. 7 might be employed. Notice: The number of observations in the two groups does not have to be identical! Example We have e.g. 4 repeated measurements of one sample and 3 repeated measurements of another sample. We will test whether the two samples can be assumed to be identical. xi Sample 1 Sample 2 3.6 4.6 4.5 5.2 3.9 4.9 4.3 4.075 4.900 ni 4 3 si 0.403 0.300 Here the average, the number of measurements and the sample standard deviation are stated for each sample. Notice: Two measurements in the same column have nothing to do with each other! The measurements of the same sample can be interchanged, as they are stated in random order. Calculation of the test statistic At first a common (“pooled”) sample standard deviation of the two samples is calculated, as we assume that the two samples have identical standard deviations. The equation is: sp ( n1 1)s12 ( n2 1)s22 n1 n2 2 7 Here s1 and n1 are sample standard deviation and the number of measurements of the first sample respectively, and correspondingly we have s2 and n2 of sample no. 2. In the example sp = 0.365. Subsequently a test statistic is calculated t0 x1 x 2 1 1 sp n1 n2 Here, x1 is the average of sample no.1, x2 is the average of sample no.2. In the example t0 = -2.959. Distribution of the test statistic Again, the test statistic is characterized by a t-distribution. The number of degrees of freedom is (n1 + n2 –2), i.e. in the example there are 5 degrees of freedom. Critical values are 2.571 respectively (level of significance of 5%) and 4.032 (level of significance of 1%). Conclusion We have in the example found a test statistic of –2.959. This is not within the interval -2,571 to 2,571, for which reason we reject the assumption that the means are equal (on the 5% level). On the contrary the test statistic of –2.959 is within the interval –4.032 to 4.032, for which reason we must accept the assumption that the means are equal on the 1% level. One-sided test If we know in advance (for subject-matter reasons) that sample no.2 always results in larger measurement results than sample no.1, a One-sided test is employed: Critical values are found to be 2.015 respectively (5% level) and 3.365 (1% level). The conclusion is here the same as for a two-tailed test. Test no. 5 – test of equality of two means, different variances in the groups. Introduction Here the situation is similar to the one in test no. 4, but we cannot (or will not) assume the standard deviations of the two groups to be identical. In many situations the standard deviation will depend of the mean (the “level”). If it is not known in advance whether the means are equal, it cannot automatically be assumed that the standard deviations are equal. Also, an initial examination of the 8 equality of the standard deviations (by means of test no. 7) might have rejected this assumption. Example Same example as for test no. 4. Only here it is not assumed that the two samples have the same standard deviation. xi Sample 1 Sample 2 3.6 4.6 4.5 5.2 3.9 4.9 4.3 4.075 4.900 ni 4 3 si 0.403 0.300 Calculation of the test statistic Now we will not calculate a pooled sample standard deviation, but the following test statistic directly: t0 x1 x 2 s12 s22 n1 n2 Consequently only the average, the sample standard deviation and the number of measurements of each sample enter the test statistic. In the example we get t0 = -3.10. Distribution of the test statistic Here the test statistic is approximately characterized by a t-distribution. The number of degrees of freedom () is more complicated: ( s12 / n1 s22 / n2 ) 2 ( s12 / n1 )2 ( s22 / n2 ) 2 n1 1 n2 1 In the example we get = 4.987. This is rounded off to = 5. Critical values are 2.571 respectively (level of significance of 5%) and 4.032 (level of significance of 1%). Conclusion We have in the example found a test statistic of -3.10. This is not within the interval –2.571 to 2.571, for which reason we reject the assumption that the means are equal (on the 5% level). However, the test statistic –3.10 is within the interval –4.032 to 4.032, for which reason we must accept the assumption that the means are equal on the 1% level. 9 One-sided test If we know in advance that sample no.2 always results in larger measurement results than sample no.1, a One-sided test is employed: Critical values are found to be 2.015 respectively (5% level) and 3.365 (1% level). The conclusion is here the same as for a two-tailed test. Test no. 6 – test of a given variance (standard deviation). Introduction This test is employed in the following situation: We have a group of n independent and normally distributed observations x1 to xn. We will test whether the variance can be assumed equal to a given value 02, i.e. whether 0 (or whether 0). This test can be used e.g. prior to test no.1 if it is not certain whether the known value of the standard deviation can still be employed. In this situation the mean is unknown, i.e. we estimate this by means of the average of the observations. This average enters the calculation of the sample standard deviation, which is our assessment of the standard deviation. Example We use the same date as in test no.1 and 2. x1 4.6 x2 8.5 x3 4.9 n s x 4 5.825 1.806 x4 5.3 02 1.5 1.45 In this example there are 4 observations, i.e. n = 4. We know from many previous analyses that the standard deviation (the sample standard deviation) can be assumed constant =1.5 (i.e. 0 =1.5). Our purpose is to test whether this assumption is still valid. Calculation of the test statistic It would be reasonable to employ the ratio s2/2 as the test statistic. Here, values far from 1 will be critical. For technical reasons this test statistic is multiplied with (n-1), where n = the number of observations. I.e. the test statistic employed is 2 0 ( n 1)s 2 02 Here, values far from (n-1) will be critical. In the example we obtain 02 =4.35. 10 Distribution of the test statistic The test statistic is characterized by a so-called 2-distribution (”chi-square”). Just as the t-distribution this is a family of distributions, each with its own number of degrees of freedom. Here the number of degrees of freedom is (n-1). In the example n = 4, i.e. the number of degrees of freedom is 3. Both small and large values of 02 are critical. On the 5% level of significance the critical values can be read to be 0.22 (the 2.5% fractile) and 9.35 (the 97.5% fractile). On the 1% level of significance the critical values are correspondingly obtained to be 0.07 (the 0.5% fractile) and 12.84 (the 99.5% fractile). Conclusion In the example the test statistic 02 =4.35 is between the critical values of the 5% level of significance 0.22 and 9.35. This implies that we will accept the assumption that the standard deviation equals 1.5 on the 5% level (and thus also on the 1% level). One-sided test If it is known in advance that the sample standard deviation cannot be smaller (larger, respectively) than the given value, a One-sided test can be employed. Here, the critical value on the 5% level of significance is the 95% fractile (the 5% fractile, respectively). On the 1% level of significance the 99% fractile (the 1% fractile, respectively) must be used. In the example the calculated sample standard deviation s =1.806 is larger than the previously given value 0 =1.5. If this was known in advance a One-sided test could be used, comparing the test statistic 02 =4.35 to the 95% fractile in a 2-distribution with 3 degrees of freedom. This can be read from a table to be 7.81. In this case we will accept the assumption that the standard deviation equals the given value 1.5. Test no. 7 – test of equality of two variances (standard deviations). This test is employed in the following situation: We have two groups of independent and normally distributed observations. We will test whether the variances (the standard deviations) of the two groups can be assumed equal. This test can be employed e.g. prior to test no. 4 if it is not certain that the two groups have the same variance. Example We employ the same data as in test no.4. We will examine whether the variances (the standard deviations) of the two samples can be assumed equal. 11 xi Sample 1 Sample 2 3.6 4.6 4.5 5.2 3.9 4.9 4.3 4.075 4.900 ni 4 3 si 0.403 0.300 Here the average, the number of measurements and the sample standard deviation of each sample are stated. Calculation of the test statistic It would be reasonable to employ the ratio of the two calculated variances as test statistic. I.e. we employ the test statistic F0 s12 / s22 Values far from 1 are critical. The test statistic 1/F0 = s22 / s12 could be employed as well. For technical reasons the larger of these two quantities is always chosen. In the example we obtain F0 =1.805. Consequently 1/F0 = 0.554. In this case F0 is used. Distribution of the test statistic The test statistic is characterized by a “F-distribution”. This is a family of distributions, which have two numbers of degrees of freedom, one for the numerator and one for the denominator. In the example we use F0 = s12 / s22. I.e. we will employ a F-distribution with (3,2) degrees of freedom, as the number of degrees of freedom of the numerator s12 is 3 (we have 4 observations), while the number of degrees of freedom of the denominator s22 is 2 (we have 3 observations). In as much as we have chosen the larger of the two possible test quantities F0 and 1/F0, it is only large values that are critical. But it is still considered a two-tailed test. Therefore we will only compare the test statistic to the 97.5% fractile in the Fdistribution (5% level of significance) and the 99.5% fractile (1% level of significance) respectively. In the table the 97.5% fractile in an F-distribution with (3,2) degrees of freedom can be read to be 39.17. The 99.5% fractile can similarly be read to be 199.16. Conclusion In the example the test statistic F0 =1.805 is smaller than 39.17 (the 97.5% fractile), and we will accept the assumption that the two variances (standard deviations) are equal. 12 One-sided test Assume that we know in advance that the variance (the standard deviation) of group 1 will be larger than the one of group 2. In this case we will compare to the 95% fractile (test on the 5% level of significance) and the 99% fractile (1% level of significance) respectively instead. In a table of the F-distribution with (3,2) degrees of freedom the 95% fractile is read to be 19.16 while the 99% fractile is 99.16. The conclusion is still that we accept the assumption that the two variances (standard deviations) are equal. References ISO 2854: Statistical Interpretation of data – Techniques of estimation and tests relating to means and variances. (Treats all tests, no. 1 – no. 7). Microsoft Excel: Help-texts of the statistical functions and the menu ”Data Analysis” under ”Tools”. 13