Download Tests in the normal distribution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Tests in the normal distribution
This text provides an overview of the most important statistical tests:
Test no. 1 – test of mean, variance known.
Test no. 2 – test of mean, variance unknown.
Test no. 3 – paired observations.
Test no. 4 – test of equality of two means, same variance in the
groups.
5. Test no. 5 – test of equality of two means, different variance in the
groups.
6. Test no. 6 – test of a given variance (standard deviation).
7. Test no. 7 – test of equality of two variances (standard deviations).
1.
2.
3.
4.
The tools are general statistical tests, which can easily be performed by means of
Microsoft Excel. The purpose of these techniques is to perform different statistical
tests concerning mean and variance in one or two groups of observations for normally
distributed data.
It is assumed that the reader is capable of calculating mean and standard deviation of
a group of observations. This can conveniently be done in Excel by means of the
functions AVERAGE and STDDEV respectively. As a matter of form the equations
are stated here.
Sometimes the notion “mean” (often depicted is employed for a “true” value, while
“average” is employed for the estimated value. Equally there is sometimes distinction
between the concepts “population standard deviation” (often depicted ) for the true
value and “sample standard deviation” for the estimated value. Often though, the
notions will be employed indiscriminately.
We have a group of n independent and normally distributed observations x1 to xn. The
average (the “mean”) is given by the equation
x
1 n
x
n i1 i
The variance is given by the equation
1 n
s 
( xi  x )2

n  1 i 1
2
and the sample standard deviation (the “standard deviation”) is s, i.e. the square root
of the variance.
In the remainder of this section detailed equations will not be given everywhere.
Instead you are referred to the technical appendix ”Technical summary” at the end of
this note.
1
Test no. 1 – test of mean, variance known.
Introduction
This test is used in the following situation: We have a group of n independent and
normally distributed observations x1 to xn, cf. above. Their standard deviation  is
supposed known, thus this applies to the variance too. The mean however is
unknown. We will test whether the mean can be assumed equal to a value 0 given
in advance.
We will therefore not undertake any calculation of the standard deviation, but we
estimate the mean by the average of all the observations, cf. above. If it is doubtful
whether the given standard deviation can still be used, test no. 6 might be employed.
Example
x1
4.6
x2
8.5
x3
4.9
n
x
4 5.825
x4
5.3
0 x -0
5 0.825
z0

1.5 1.100
In this example there are 4 observations, i.e. n = 4. We know from many earlier
analyses that the sample standard deviation (the standard deviation) can be assumed to
be constant = 1.5. The mean () is unknown, but is estimated by the average
x = 5.825. Our purpose is to test whether this differs from a target-value of 0 = 5.0.
Calculation of the test statistic
Our average x = 5.825 has to be compared to the target-value 5.0. Numerically large
deviations (positive or negative) will imply that we reject the assumption of a
mean = 5.0.
The deviation between the average and the target should of course be compared to the
standard deviation, i.e. =1.5. If there is a large standard deviation it takes
(numerically) larger deviations to make us reject the assumption of a mean = 5.0.
It is a fact that the more observations available, the more precisely the average is
determined. If e.g. 4 times as many observations get available, the variance must be
divided by 4, i.e. the standard deviation must be divided by 2.
Generally the following rule applies: The standard deviation of a mean is obtained by
dividing the original standard deviation by the square root of the number of
observations.
This results in the use of the following “test statistic”:
z0 
x  0
/ n
2
In the example above z0 =1.100. The question is then how large a value (numerically)
of z0 will imply that we reject the assumption of a mean = 5.0.
Distribution of the test statistic
This question is related to the distribution of the test statistic. In the current case, it
can be shown that the test statistic is characterized by a “standard” normal distribution
(i.e. mean 0 and standard deviation 1). This distribution has the advantage that it has
been tabulated.
It is only necessary to look up the (standard) normal distribution in a table to find out
how large (numerical) values of z0 that will imply the rejection of the assumption of a
mean = 5.0.
The answer to this question is related to the choice of level of significance. Most often
we choose the 5% level of significance, but sometimes we choose the 1% level of
significance. These two levels of significance are those recommended in different
ISO-standards.
If we choose a level of significance of 5%, (numerical) values above 1.96 are critical.
This is because 95% of all observations in a standard normal distribution are within
–1.96 and 1.96. Technically you call –1.96 the 2.5% fractile and 1.96 the 97.5%
fractile, and these values are found in the table.
If we choose a level of significance of 1%, (numerical) values above 2.576 are
critical, because 99% of all observations in a standard normal distribution are within
–2.576 (which is the 0.5% fractile) and 2.576 (which is the 99.5% fractile).
Conclusion
We have in the example found a test statistic of 1.100. This is well below 1.96, for
which reason there is no statistical evidence for rejection of the assumption that the
mean equals the target-value, and this assumption must thus be accepted (on the 5%
level, and of course also on the 1% level).
Plainly speaking: The found average value 5.825 does not differ markedly from the
accepted target-value 5.0.
One-sided test
On rare occasions it occurs that you know in advance that it is impossible to reach an
average below target. In these cases a so-called One-sided test is employed.
In the example it is done like this: We have an average of 5.825, which is above 5.0 as
expected (if it was not, our prior knowledge would be wrong!).
In a One-sided test you must (for tests on the 5% level) compare the test statistic
1.100 with the 95% fractile instead of the 97.5% fractile. In a table this is read to be
1.645. Our assumption that the average equals target is still accepted.
3
For a test on the 1% level the comparison is made to the 99% fractile, which is 2.326.
If our prior knowledge on the contrary is that it is impossible to reach an average
above target, then values below -1.645 (and –2.326 respectively) are critical.
Test no. 2 – test of mean, variance unknown.
Introduction
This test is employed in the following situation: We have a group of n independent
and normally distributed observations x1 to xn, cf. above. Their standard deviation 
is supposed unknown, thus this applies to the variance too. The mean  is equally
unknown. We will test whether the mean can be assumed equal to a value 0 given
in advance.
In this test we will estimate both the standard deviation (with the sample standard
deviation) and the mean (with the average).
Example
Same data as in test no. 1.
x1
4.6
x2
8.5
x3
4.9
n
x
4 5.825
x4
5.3
s
t0
0 x -0
5 0.825 1.806 0.914
In this example there are 4 observations, i.e. n = 4. The standard deviation is now
estimated by the sample standard deviation s =1.806. The mean  is estimated by the
average x = 5.825. Our purpose is to test whether this differs from a target value of
0 = 5.0.
Calculation of the test statistic
Compared to test no.1 the difference is that we employ the sample standard deviation
s as a measure of the standard deviation, because we do not have a value known in
advance.
This implies the use of the following test statistic, which we now call t0 :
t0 
x  0
s/ n
In the example above t0 = 0.914. The question is how large a value (numerically) of t0
will imply that we reject the assumption of a mean = 5.0.
Distribution of the test statistic
4
Now the test statistic is no longer characterized by a (standard) normal distribution.
When there is a large number of observations (e.g. more than 40) it will be a good
approximation, but when there are few observations this is not the case.
Instead of the normal distribution a “Student’s” t-distribution must be employed. This
is not one but a whole family of distributions. If there are n observations (at least 2), a
(Student’s) t-distribution with n-1 “degrees of freedom” is employed.
In the example there are 4 observations, i.e. the number of degrees of freedom is 3.
If we choose a level of significance of 5%, the critical limit is found to be 3.182 (the
97.5% fractile) in the table.
If we choose a level of significance of 1%, the critical limit is found to be 5.841 (the
99.5% fractile) in the table.
Conclusion
In the example we have found a test statistic of 0.914. This is well below 3.182, for
which reason there is no statistical evidence for rejection of the assumption that the
mean equals the target-value, and thus this assumption must be accepted (on the 5%
level, and of course also on the 1% level).
One-sided test
If we know in advance that it is impossible to reach an average below target, a Onesided test is employed:
In the example we have an average of 5.825, which is above 5.0 as expected.
In a One-sided test you must (for tests on the 5% level) compare the test statistic
0.914 with the 95% fractile instead of the 97.5% fractile. In a table of the tdistribution (with 3 degrees of freedom) this is read to be 2.353. Our assumption that
the average equals target is still accepted.
For a test on the 1% level the comparison is made to the 99% fractile, which is 4.541.
Test no. 3 – paired observations.
Introduction
The situation could be the following: We have n samples, which have each been
analysed by two laboratories. We are interested in examining whether there are
differences between the two laboratories, whereas differences between the samples
are not of interest on this occasion.
This problem can be handled by calculating the difference between the results of the
two laboratories, because it is only the difference between the laboratories that is of
interest. The individual results of each laboratory are not relevant.
5
Example
Laboratory 1
4.6
8.5
4.9
5.3
Laboratory 2
5.0
9.0
5.0
6.0
n
d
Difference
0.4
0.5
0.1
0.7
4
0.4
sd
t0
0.25
3.4
We have 2 laboratories, and analyses of 4 samples have been performed. In the first
row the results of laboratory 1 are shown and in the second row the results of
laboratory 2.
The difference between the results (Laboratory 2 – Laboratory 1) is shown in the third
row, together with the relevant calculations. The average of the differences is 0.4. We
will test whether the laboratories are identical, i.e. whether the mean of the differences
can be assumed to be 0. In comparison to test no. 2, 0 is similar to the target mean.
Calculation of the test statistic
The test is similar to test no. 2, as the standard deviation of the differences will be
unknown in practice. For this reason we can set up the test statistic,
t0 
d
sd / n
In the example we obtain t0 = 3.4.
Distribution of the test statistic
As in test no. 2 this test statistic is characterized by a (Student’s) t-distribution. The
number of degrees of freedom is here also n-1, because we have n differences.
In the example there are 4 differences, i.e. the number of degrees of freedom is 3.
The critical values are (as in the example in test no. 2) 3.182 (the 97.5% fractile, level
of significance of 5%) and 5.841 (the 99.5% fractile, level of significance of 1%)
respectively.
Conclusion
In the example we have obtained a test statistic of 3.4. This is above 3.182, for which
reason we must reject the assumption of similarity of the laboratories (on the 5%
level).
On the contrary the test statistic 3.4 is below 5.841. On the 1% level of significance
we will accept the assumption that the laboratories are identical.
One-sided test
If we know in advance (e.g. for professional reasons) that laboratory 2 always
provides results larger than those of laboratory 1, a One-sided test is employed:
6
Critical values are obtained (as in the example for test no. 2) to be 2.353 (the 5%
level, the 95% fractile) and 4.541 (the 1% level, the 99% fractile) respectively.
The conclusion is here the same as for a two-tailed test: On the 5% level we reject
while we on the 1% level accept the assumption that the laboratories are identical.
Test no. 4 – test of equality of two means, same variance in the groups.
Introduction
This test is employed in the following situation: We have two groups of independent
and normally distributed observations. Their standard deviation  is assumed
unknown, but common to both groups. The mean of both groups is unknown. We will
test whether the means of the two groups can be assumed equal.
In this test we will estimate the mean of both groups separately and also the common
standard deviation. If doubt arises whether it is correct to assume that the two groups
have the same standard deviation, test no. 7 might be employed.
Notice: The number of observations in the two groups does not have to be identical!
Example
We have e.g. 4 repeated measurements of one sample and 3 repeated measurements of
another sample. We will test whether the two samples can be assumed to be identical.
xi
Sample 1
Sample 2
3.6
4.6
4.5
5.2
3.9
4.9
4.3
4.075
4.900
ni
4
3
si
0.403
0.300
Here the average, the number of measurements and the sample standard deviation are
stated for each sample.
Notice: Two measurements in the same column have nothing to do with each other!
The measurements of the same sample can be interchanged, as they are stated in
random order.
Calculation of the test statistic
At first a common (“pooled”) sample standard deviation of the two samples is
calculated, as we assume that the two samples have identical standard deviations. The
equation is:
sp 
( n1  1)s12  ( n2  1)s22
n1  n2  2
7
Here s1 and n1 are sample standard deviation and the number of measurements of the
first sample respectively, and correspondingly we have s2 and n2 of sample no. 2. In
the example sp = 0.365.
Subsequently a test statistic is calculated
t0 
x1  x 2
1
1
sp

n1 n2
Here, x1 is the average of sample no.1, x2 is the average of sample no.2. In the
example t0 = -2.959.
Distribution of the test statistic
Again, the test statistic is characterized by a t-distribution. The number of degrees of
freedom is (n1 + n2 –2), i.e. in the example there are 5 degrees of freedom.
Critical values are 2.571 respectively (level of significance of 5%) and 4.032 (level
of significance of 1%).
Conclusion
We have in the example found a test statistic of –2.959. This is not within the interval
-2,571 to 2,571, for which reason we reject the assumption that the means are equal
(on the 5% level).
On the contrary the test statistic of –2.959 is within the interval –4.032 to 4.032, for
which reason we must accept the assumption that the means are equal on the 1%
level.
One-sided test
If we know in advance (for subject-matter reasons) that sample no.2 always results in
larger measurement results than sample no.1, a One-sided test is employed:
Critical values are found to be 2.015 respectively (5% level) and 3.365 (1% level).
The conclusion is here the same as for a two-tailed test.
Test no. 5 – test of equality of two means, different variances in the groups.
Introduction
Here the situation is similar to the one in test no. 4, but we cannot (or will not) assume
the standard deviations of the two groups to be identical.
In many situations the standard deviation will depend of the mean (the “level”). If it is
not known in advance whether the means are equal, it cannot automatically be
assumed that the standard deviations are equal. Also, an initial examination of the
8
equality of the standard deviations (by means of test no. 7) might have rejected this
assumption.
Example
Same example as for test no. 4. Only here it is not assumed that the two samples have
the same standard deviation.
xi
Sample 1
Sample 2
3.6
4.6
4.5
5.2
3.9
4.9
4.3
4.075
4.900
ni
4
3
si
0.403
0.300
Calculation of the test statistic
Now we will not calculate a pooled sample standard deviation, but the following test
statistic directly:
t0 
x1  x 2
s12 s22

n1 n2
Consequently only the average, the sample standard deviation and the number of
measurements of each sample enter the test statistic. In the example we get
t0 = -3.10.
Distribution of the test statistic
Here the test statistic is approximately characterized by a t-distribution. The number
of degrees of freedom () is more complicated:

( s12 / n1  s22 / n2 ) 2
( s12 / n1 )2 ( s22 / n2 ) 2

n1  1
n2  1
In the example we get  = 4.987. This is rounded off to  = 5.
Critical values are 2.571 respectively (level of significance of 5%) and 4.032 (level
of significance of 1%).
Conclusion
We have in the example found a test statistic of -3.10. This is not within the interval
–2.571 to 2.571, for which reason we reject the assumption that the means are equal
(on the 5% level).
However, the test statistic –3.10 is within the interval –4.032 to 4.032, for which
reason we must accept the assumption that the means are equal on the 1% level.
9
One-sided test
If we know in advance that sample no.2 always results in larger measurement results
than sample no.1, a One-sided test is employed:
Critical values are found to be 2.015 respectively (5% level) and 3.365 (1% level).
The conclusion is here the same as for a two-tailed test.
Test no. 6 – test of a given variance (standard deviation).
Introduction
This test is employed in the following situation: We have a group of n independent
and normally distributed observations x1 to xn. We will test whether the variance 
can be assumed equal to a given value 02, i.e. whether 0 (or whether 0).
This test can be used e.g. prior to test no.1 if it is not certain whether the known value
of the standard deviation can still be employed.
In this situation the mean is unknown, i.e. we estimate this by means of the average of
the observations. This average enters the calculation of the sample standard deviation,
which is our assessment of the standard deviation.
Example
We use the same date as in test no.1 and 2.
x1
4.6
x2
8.5
x3
4.9
n
s
x
4 5.825 1.806
x4
5.3
02

1.5 1.45
In this example there are 4 observations, i.e. n = 4. We know from many previous
analyses that the standard deviation (the sample standard deviation) can be assumed
constant =1.5 (i.e. 0 =1.5). Our purpose is to test whether this assumption is still
valid.
Calculation of the test statistic
It would be reasonable to employ the ratio s2/2 as the test statistic. Here, values far
from 1 will be critical. For technical reasons this test statistic is multiplied with (n-1),
where n = the number of observations.
I.e. the test statistic employed is
 
2
0
( n  1)s 2
 02
Here, values far from (n-1) will be critical.
In the example we obtain 02 =4.35.
10
Distribution of the test statistic
The test statistic is characterized by a so-called 2-distribution (”chi-square”). Just as
the t-distribution this is a family of distributions, each with its own number of degrees
of freedom. Here the number of degrees of freedom is (n-1).
In the example n = 4, i.e. the number of degrees of freedom is 3.
Both small and large values of 02 are critical. On the 5% level of significance the
critical values can be read to be 0.22 (the 2.5% fractile) and 9.35 (the 97.5% fractile).
On the 1% level of significance the critical values are correspondingly obtained to be
0.07 (the 0.5% fractile) and 12.84 (the 99.5% fractile).
Conclusion
In the example the test statistic 02 =4.35 is between the critical values of the 5% level
of significance 0.22 and 9.35. This implies that we will accept the assumption that the
standard deviation equals 1.5 on the 5% level (and thus also on the 1% level).
One-sided test
If it is known in advance that the sample standard deviation cannot be smaller (larger,
respectively) than the given value, a One-sided test can be employed.
Here, the critical value on the 5% level of significance is the 95% fractile (the 5%
fractile, respectively). On the 1% level of significance the 99% fractile (the 1%
fractile, respectively) must be used.
In the example the calculated sample standard deviation s =1.806 is larger than the
previously given value 0 =1.5. If this was known in advance a One-sided test could
be used, comparing the test statistic 02 =4.35 to the 95% fractile in a 2-distribution
with 3 degrees of freedom. This can be read from a table to be 7.81. In this case we
will accept the assumption that the standard deviation equals the given value 1.5.
Test no. 7 – test of equality of two variances (standard deviations).
This test is employed in the following situation: We have two groups of independent
and normally distributed observations. We will test whether the variances (the
standard deviations) of the two groups can be assumed equal.
This test can be employed e.g. prior to test no. 4 if it is not certain that the two groups
have the same variance.
Example
We employ the same data as in test no.4. We will examine whether the variances (the
standard deviations) of the two samples can be assumed equal.
11
xi
Sample 1
Sample 2
3.6
4.6
4.5
5.2
3.9
4.9
4.3
4.075
4.900
ni
4
3
si
0.403
0.300
Here the average, the number of measurements and the sample standard deviation of
each sample are stated.
Calculation of the test statistic
It would be reasonable to employ the ratio of the two calculated variances as test
statistic.
I.e. we employ the test statistic
F0  s12 / s22
Values far from 1 are critical.
The test statistic 1/F0 = s22 / s12 could be employed as well. For technical reasons the
larger of these two quantities is always chosen.
In the example we obtain F0 =1.805. Consequently 1/F0 = 0.554. In this case F0 is
used.
Distribution of the test statistic
The test statistic is characterized by a “F-distribution”. This is a family of
distributions, which have two numbers of degrees of freedom, one for the numerator
and one for the denominator.
In the example we use F0 = s12 / s22. I.e. we will employ a F-distribution with (3,2)
degrees of freedom, as the number of degrees of freedom of the numerator s12 is 3 (we
have 4 observations), while the number of degrees of freedom of the denominator s22
is 2 (we have 3 observations).
In as much as we have chosen the larger of the two possible test quantities F0 and
1/F0, it is only large values that are critical. But it is still considered a two-tailed test.
Therefore we will only compare the test statistic to the 97.5% fractile in the Fdistribution (5% level of significance) and the 99.5% fractile (1% level of
significance) respectively.
In the table the 97.5% fractile in an F-distribution with (3,2) degrees of freedom can
be read to be 39.17. The 99.5% fractile can similarly be read to be 199.16.
Conclusion
In the example the test statistic F0 =1.805 is smaller than 39.17 (the 97.5% fractile),
and we will accept the assumption that the two variances (standard deviations) are
equal.
12
One-sided test
Assume that we know in advance that the variance (the standard deviation) of group 1
will be larger than the one of group 2. In this case we will compare to the 95%
fractile (test on the 5% level of significance) and the 99% fractile (1% level of
significance) respectively instead.
In a table of the F-distribution with (3,2) degrees of freedom the 95% fractile is read
to be 19.16 while the 99% fractile is 99.16. The conclusion is still that we accept the
assumption that the two variances (standard deviations) are equal.
References
ISO 2854: Statistical Interpretation of data – Techniques of estimation and tests
relating to means and variances. (Treats all tests, no. 1 – no. 7).
Microsoft Excel: Help-texts of the statistical functions and the menu ”Data Analysis”
under ”Tools”.
13