Download Randomization Tests

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
RANDOMIZATION TESTS
Robert M. Hamer
Medical College of
V;rginia~
Virginia Commonwealth University
In order to use the usual parametric statis-
If the data have appropriate measurement
characteristics (usually interval level continuous measurement) but an unknown or untractable
distribution (in that it cannot be transformed to
a tractable one), then use of a nonparametric
test is wasteful. Nonparametric tests are wasteful in this situation because most nonparametric
tests not only ignore the distributional characteristics of such data. but also ignore the measurement characteristics. When the data are interval or ratio level, this results in discarding
infonnation.
tics (the kind that most of us were taught first,
and the kind that most of us use most of the
time), two related sets of assumptions regarding
the data must be satisfied. First, the data
must meet the measurement characteristics assump-
tions (measurement level and type) (type means
continuous vs.discrete). Most common statistical
techniques require that the data be at least interval level and be continuous. Second, the
data must satisfy the distributional assumptions
of the statistical technique. Most common, usual,
parametric statistical techniques require that
the data be normally distributed or amenable to
a transformation which will make them normally
distributed. (Many other statistical techniques
have other assumptions. specific to the techniques themselves. which I will not discuss here.)
When these conditions are not satisfied, a data
analyst has three choices: (l)perform the test
anyway, (2) transform the data, and (3) find
another test.
For example, suppose an experimentor is interested in the relationship between two variables. x and y. (All subjects have been measured
on both x and y.) Suppose the two measures are
interval level measures (that is. they are numbers
with which one can legiUmately perform arithmetic), but upon checking the data, thE data analyst discovers that the data are not distributed
according to any recognizable or tractable distribution.
If the data analyst chooses the first alternative, he or she must be satisfied that the
statistical technique is robust to such violations. that is, that the technique performs
"reasonably well ' when the assumptions are violated in the manner specified. If the data analyst
chooses the second technique, he or she must be
satisfied that the transformed data are tractable for analysis. If the data analyst chooses
the third alternative, he or she often then has
a variety of nonparametric tests from which to
choose.
Under such circumstances, the correlation
coefficient would have been the appropriate statistic if the data were distributed according to
a bivariate normal distribution. The data analyst
could transform the data to ranks and use a Spearman rank correlation, but he or she would then
lose all the information in the data except the
ranks.
!
Since the correlation coefficient is a least
squares statistic9 it does not depend upon any
distributional assumptions. Regardless of the
distribution of the data (so long as the mean and
variance are finite), a correlation coefficient
has certain properties: a range between -1.00
(perfect negative linear relationship) and +1.00
(perfect positive linear relationship), and a
square that is the proportion of shared variance.
However, when the data are not distributed according to a bivariate normal distribution, the
Fisher Z transformation of a correlation coefficient is not an appropriate test. In fact, the
distribution of the correlation coefficient in
that situation is unknown.
If the measurement characteristics are not
met, a usual approach is to transform the data
to ranks, and use a rank-order statistic. Many
common "nonparametric" statistics, are rank order
statistics (e.g., Wilcoxon tests, Spearman and
the various Kendall correlations, Mann-Whitney
"U" test, rank-order ANOVA, etc.). When we make
the rank order transformation we do two things:
affirm the measurement characteristics of the
data as ordinal, and produce a new set of data
with a known distribution (no matter what the
original distribution was, the ranks have a new
known distribution for the appropriate test).
Randomization Tests
Thus. if the measurement characteristics
are not met and, in addition, the distribution
is unknown. the transformation to ranks will
correct not only the measurement problem but
also the distributional problem.
There exists a class of tests for exactly'
this type of situation, and this class of tests
is called randomization tests. Randomization
tests do not make distributional assumptions regarding the data. but use the information in the
data up to the 1imits of the measurement characteristics.
If the data have appropriate measurement
characteristics but do not meet the distributional assumptions, often a transformation (e.g.
log. arcsine. etc.) will make the data more normal. For example, a log transformation will
make a distribution Which is the exponential of
a normal distribution normal.
One may look to Fisher for the first conceptualization of a randomization test. In "The Design of Experiments". in 1935, Fisher outl ined
and explained a randomization test which I shall
cover later in this paper. Fisher (1971) stated,
818
In these discussions. it seems to have
escaped recognition that the physical act
of randomization, which as has been shown,
is necessary for the validity of any test
of significance, affords the means, in respect of any particular body of data, of
examining the wider hypothesis in which
no normality of distribution is implied.
The arithmetical procedure of such an examination is tedious, and we shall only
give the results of its application in
order to show the possibility of an independent check on the more expeditious methods in common use (Fisher, 1971, p.4S).
have tried to claim wonderful properties for randomization tests and indeed wonder why we would
ever want to use old-fashioned "parametriC" techniques. Some have made extravagent claims for
them and state that they make random assignment
unnecessary. Some allege that they are the perfect way to analyze one subject designs. Basu
(1980) and the accompanying comments contain an
excellent discussion of these aspects of randomization tests.
I shall avoid all controversies. In this
I present some examples of a class of statistical techniques which have a long and distinguished theoretical history. They have not had
much of a practical history, because without a
computer, they were impractical. I shall also
present SAS procedures which may be used to perform randomization-test analyses for all situations discussed in this paper.
paper~
Fisher then went on to use the data which he had
previously used in a paired-t-test as an exampl e of
a randomization test for the hypothesis of identical populations. His reasoning was as follows.
Fisher's example used 15 pairs of plant seeds, and
within each pair, the seeds were assigned randomly
to 2 groups. One group was cross-fertilized and
one was self-fertilized. He then calculated
(after growth) 15 differences in height (one per
pair) and proposed to test the hypothesis that
there was a statistically significant difference
between the two methods of fertilization.
The Paired T-Test
The first randomization test I will discuss
is the paired t-test. First, I will describe the
situation~ and calculate the usual t-statistic.
Then, 1 will calculate a p-value for that t-statistic via the randomization test method.
Fisher's reasoning was that if the two series
of seeds were random samples from identical populations, then the 15 observed within-pair differences that occurred would each have occurred with
equal frequency, each with a positive or negative
sign. We may sum the differences we actuallY1
obtained, and we may "ask how many of the 2 S
numbers, which may be formed by giving each component alternatively a positive and a negative sign,
e~ceed this value.
Since ex hypothesi each of
these 215 combinations will occur by chance with
equal frequency, a knowledge of how many of them
are equal to or greater than the value actually
observed affords a direct arithmetical test of
the significance of this value" (p. 46).
Suppose we have n pairs of subjects (for example, twins). Suppose we randomly assign one
subject from each pair to each of two groups.
Suppose we then perform some experimental manipulation On one group, and we now wish to compare
the groups. The usual test would be a matchedpair t-test. We would calculate difference scores
within each pair and perform a one group t-test
on the mean of the differences testing the null
hypothesis Ho: "d = O.
Suppose that for distributional reasons we
were precluded from performing the above test.
Suppose, for example, that our dependent variable
was a waiting time. or better still. a correlation between repeated pairs of observations on
each subject. In other words, suppose that while
the measurement characteristics of the dependent
variable are of sufficient metricity such that
we can do arithmetic with it (interval level, continuous), the distribution of the dependent var1able is completely unknown. A randomization test
would then be appropriate.
In addition~ there is a randomization test
in wide use today: Fisher's Exact Test for 2 * 2
tables. Most of us were taught the test early in
our first course, and most of us use it. Until
computers were developed, it was cumbersome. tedious, and not widely used. It is in fact a randomization test, and today, most statistical packages (including SAS) perform the Fisher Exact
Test.
The rationale behind the randomization test
for this situation is as follows: We have randomly assigned one subject from each pair to a
group and the other subject from each pair to the
other group. If the experimental manipulation
made no difference (the null hypothesis) then
there should be no difference (except for random
error) between the scores made by subjects in one
group and the scores made by subjects in the other
group.
Over the years, other researchers developed
randomization tests for various situations. As
mentioned, Fisher developed the '1Exact Test" for
the 2 .. 2 table. In 1937, Pittman developed a
randomization test for the correlation coefficient. (Pittman, 1937a, 1937b). (I also developed
this test a couple of years ago and was terribly
disappointed to discover someone else had preceeded me). Pittman (1938) also developed a randomization test for ANOVA. These tests remained
little more than elegant mathematical curiosities
until recently~ when the use of high speed computers made their use practical.
If there is no difference (except for random
error) between the groups, then it should not
matter to which group a subject is assigned. In
other words, switching the scores within a pair
should make no difference (on the average) in the
group sc-oreS. There are 2n pass i b1e permuta ti ons
Recently, randomization tests have been the
center of some controversy. Some researchers
819
above test. The data are still interval level
and continuous (in other words. the number'S still
possess metric properties). We would still like
to use the information contained in these numbers~
even though they do not meet the distributional
requirements of at-test.
of the data which we can form by switching with
in pairs. We view these 2n possible permutations
as the entire population from which our sample
was drawn. We may calculate a t-value for each
of these 2n samples. The probability of drawing
a sample at random from this population with a
t-value as large or larger is simply the number
of t-va1ues as large or larger divided by 2n.
Under the null hypothesis we view our sample as
having been randomly drawn from this population.
If so, then our observed t-value should be relatively close to the center of the population
distribution of t-values. If the t-value is
close to the extremes, it indicates a low probability that our observed sample was in fact randomly drawn from this population.
The reasoning behind the randomization test
for this situation is as follows: Assume that
(under the null hypothesis) there are nc group
differences. Then it should make no difference
to which group a subject belongs. If it makes
no difference to which group a subject belongs,
we should be able to switch subjects from one
group to the other without affecting (in the
long run) the t-value. For n = n1 + n subjects
2
in two groups as described. there are n~/nl!n2!
possible combinations of the data with appropriate group sizes. We view these n!/nl!n 2!samples
as the entire population of samples from which
our one sample was presumably randomly drawn
(under the null hypothesis).
For example, suppose we have 5 pairs of
rats. Suppose we randomly assign one rat from
each pair to group 1 and the other to group 2.
Suppose we feed the groups different diets and
at the end of some time period make some measurement. We might have the data described in Table
1. The classic way of testing the null hypothesis of no difference would be to calculate difference scores and perform a one group t-test
On the difference scores.
Each sample will possess a t-value. All
these n:/n !n : t-va1ues have an equal probability of belng 2chosen under the null hypothesis.
If our observed sample is one with at-value
close to the center of the- distribution, then
we would fail to reject the null hypothesis. If
the observed t-va1ue is close to the extremes of
the distribution, we would reject the null hypothesis. We calculate the p-value by counting
the number of times our observed t-value is
equaled or exceeded in our n!/nl!n2! samples and
dividing this number by n!/nl~n2!'
Using a randomization test to obtain the
p-values proceeds in exactly the same manner up
to the pOint at which we would look up (or calculate) a p-value. We form difference scores,
calculate a mean difference score, and calculate a t-value. Then we permute the data all
possible ways. For 4 pairs, there are 32 possible permutations. Table 2 contains all 32 of
these possible permutations. Notice that the
permutations differ from each other only in
terms of signs.
For example, suppose we have 7 subjects,
and we randomly assign them to two groups. group
1 and group 2. Suppose nl = 3 and n2 = 4
(n l + n2 = n). Suppose we now subject the groups
to two different treatments, and we wish to test
the significance of the t-value via the randomization method.
For each permutation, we calculate at-value
These 32 t-values comprise (under the null hypothesis) the entire population from which our tvalue was drawn. Our observed t-value is 4.00.
If we rank the 32 t-values in absolute value (we
will do a two-tailed test), we see that our value
is the highest. Thus, the probability of obtaining this or any more extreme t-va1ues is 2/32
for a p-value of .0625 (for a two-tailed test).
Table 3 contains the data. as well as our
observed t-va1ue and a p-value calculated in the
usual manner from a t-distribution. Table 4
contains all n!/nl!n ! = 7!/4!3! = 35 combinations; all 35 ways i~ which we may divide the 7
observations into two groups, with 3 observations
in one group and 4 observations in another. This
table also contains a t-value calculated for each
combination. As we can see, out of 35 t-values,
ours is equalled or exceeded in absolute value
by all. Thus, our p-value is 1.00.
The Two Group T-Test
The next randomization test I will discuss
is the two group independent t-test. Suppose
we have n subjects and we randomly divide the
subjects into two groups, with n subjects in
1
group 1 and n2 subjects in group 2 (n l + n2 = n).
We then perform some experimental manipulation
on one group (or a different experimental manipulation on each of the two groups). This is
the usual two-group independent t-test situation.
If the data are continuous and at least interval
level, and if the distribution of the data is
normal with homogeneous variance. then we would
calculate a t to test the null hypothesis Ho:
"1 = "2 against Ha: "1 1 "2'
The Significance of a Correlation
The third randomization I would like to discuss is the significance test for a Pearson product-moment correlation coefficient (hereinafter
referred to as Simply the correlation). This
test works equally well for all rank-order Correlations.
First, I will describe the situation and calculate the usual correlation coefficient. Then, I will calculate a p-value for that
Suppose that for distributional reasons.
however, we were precluded from performing the
820
following assumptions: normally distributed, interval level, and continuous, and equal group
variances. Suppose it is obvious from the data
that the data are not normally distributed, the
groups differ greatly in size, and the variances
are unequal. This is a situation to which the
ANOVA technique is not robust. In this case we
might perform a randomization test.
correlation coefficient via the randomization
method.
Suppose we have n subjects. each of whom we
have measured on two variables (x and y). We
wish to assess the linear relationship of the
two variables. The usual way to do this if the
data are interval level, continuous, and bivariate normal is to calculate a correlation coefficient, perform a Fisher Z transformation; and
test the transformed coefficient using the normal distribution. Suppose that the distribution
of the variables is not bivariate normal. Then,
the use of the Fisher Z transformation is improper. in that the resulting transformed statistic
is not guaranteed to be asymPtotically normally
distributed.
One can think of the randomization test in
one way ANOVA as a straight-forward extension
(conceptually) of the two-group independent ttest. We have n, + n2 + '" + nk = n subjects,
who have been randomly assigned to the k groups.
The k groups are manipulated differentially and
an F-statistic is calculated. Under the null
hypothesis of no treatment effects, it should
make no difference to which group a subject ;s
aSSigned. There are n!/nl!n2~ ... nk! possible
\'/ays to assigning the subjects to the groups under the constraint that the group sizes remain
the same. Under the null hypothesis. the observed F-statistic was randomly drawn from the
population of F-statistics that comprise this
population. The probability of finding an Fstatistic as large or larger is thus the number
Of F-statistics as large or larger divided by
n!/nl!n2!· .. nk~' This we use as the p-value.
If we had a reasonable idea of the manner
in which the variables differed from bivariate
normality! we could transform the data to approximate normality. but sometimes this is just not
possible.
The reasoning behind a randomization test of
a correlation coefficient is as follows: We can
calculate the correlation between two variables.
Under a null hypothesis of no relationship between the two variables (Ho: p=O) it should make
no difference whiCh x-variable is paired with
which y-variable. For n observations, there are
n: permutations; we can calculate a correlation
coefficient. From this entire population of
correlation coeffiCients, each has an equal probability of being chosen under the null hypothesis. If we arranged them in order of magnitude,
we would expect ours to come fram around the center (under a null hypothesis of no relationship).
If there were only a few correlations as high or
higher. we would take this as evidence that it
is unlikely that our correlation was randomly
drawn from this population of correlations.
In this example, there will be a couple of
aspects which are somewhat "unrealistic". I am
using an exceptionally small n because otherwise
the number of permutations becomes too large to
list. For the same reason, one of the three
groups has only one observation. I am not saying
it would be a good idea to use a randomization
test with so few observations. but it is the only
way to make a manageable presentation.
Suppose we have five subjects, and three
treatments. We randomly assign the five subjects
to the 3 groups (n ; 1. n = 2. n = 2) and perform our experimenlal rnani~ulation~. There are
5!/2!2!1! ; 60/2 = 30 possible ways 5 subjects
may be assigned to the 3 groups with the specified group sizes. Table 7 contains these data.
Table 8 contains the 30 possible groupings of the
data. If we assume (under the null hypothesis)
that each of these 30 groupings had an equal
chance of being selected. then the probability
of obtaining a,result with an F as large or larger than our observed one is the number of Fstatistics greater than or equal to the observed
one divided by the number of combinations (30).
Our observed F-statistic was 9.00. The number of
F-statistics which equal or exceed it is 5.
Therefore, the p-value is .17.
Consider an example.
Table 5 contains data
on four observations, two variables. The correlation is .72. There are 4: ways in which the xvariables and y-variables can be paired. These
24 pairings are given in Table 6. The correlation between x and -y has been calculated for
each sample. As you can see, our correlation
was the highest. Thus, -the probability of obtaining a correlation this high or higher is
4/24 = .16 for a One tailed test. and 7/24 = .69
for a two tailed test.
One-Way ANOVA
The next randomization test I will discuss
is that appropriate to a one-way ANOVA, First.
I will discuss the situation. Next I will discuss the reasoning behind the test, and finally,
I will present an example.
The Fisher Exact Test
One of the most common randomization tests
used, one which all statisticians have learned,
is known as the Fisher Exact Test. It is an
exact test for use with two-by-two contingency
tables.
The situation is the one-way ANOVA situation. Suppose we have npatients randomly assigned to k groups with eel' sizes of nl , n , n ,
2 3
... nk, and n1 + n2 + '" + n :: n. We would usually perform a one-way ANOV~ to test the hypothesis HO"l ="2; "'"k if the data met the
The following example is taken from Hays
(1963, p.S98-601). Suppose that n objects are
arranged into the two-by-two table shown in
821
methods make pre-experimental random assignment
unnecessary (Edgington, 1966). Without random
assignment, causal inference is problematic. Any
treatment effects are confounded with the nonrandom assignment.
Table 9. If we consider the marginals of the
table a5 fixed, we can enumerate all the tables
that could be formed with these marginals. For
each table, we can calculate some measure of
association. Hays stated, "If one finds the probability of the arrangement actually obtained, as
well as every other arrangement giving as much or
more evidence for association, then one can test
the hypothesis that the obtained result is purely
a product of chance by taking the probability as
the significance level." (p.599).
Single Subject Designs
It has been alleged (Edgington,1980a, 1980b).
that randomization tests are appropriate for "one
subject designs.
It is my opinion that randomization tests are as appropriate for such designs
as ;s anything else - not very. Of course, the
major problem is that one cannot generalize beyond the subject, and an experimental population
limited to one subject is in general uninteresting.
11
For example (again from Hays): suppose we
observe 10 subjects and they fall into the table
given in Table 10. The probability that this
result might have occurred by chance alone is
(4:6:5:5:)/(10:1:4:3:2:) = .238. Of all other
tables which might be constructed with these marginals, only one shows more evidence of association (Table 11). The probability of this occurring by chance alone is (4!6!5!5~)/(10!0!5!4!1~)
= .024. Thus, the probability of the observed
result, or one more extreme, is .238 + .024'" .262.
Thus, we may take .262 as the p-value for an exact test of no association.
Randomization Tests and
Rank-Order Statistics
If we first transform a set of data to ranks
and then apply a randomization test, we lose some
information (the metric information in the data)
and gain in practicality (Bradley, 1968, p.87).
\1e throw the metric information away when we
transform to ranks. However, since all sets of
scores monotonically equivalent to our observed
data will produce the same ranks, we gain in
practicality in that for a given set of ranks
the test is always the same. Many lIusual nonparametric tests are derived in exactly that
manner: the data are transformed to ranks and
the randomization test is simplified because the
sample space is now able to be tabled.
Sampling from the Distribution
For most randomization tests, the maximum
sample size which it is possible to use is relatively small. For example, for the paired t-test,
16 pairs produces 65,536 possible permutations.
For the correlation randomization test, 9 subjects require 362.880 permutations. While these
sorts of numbers requi re the computer system to
take some time, they are not unreasonable.
li
Some familiar tests which are in fact rankrandomization tests are Wilcoxon's signed-rank
tests, Wilcoxon's rank-sum tests. the SiegelTukey test, Friedman rank-ANOVA. Kruskal-Wa11is
test, etc.
For example, on our system(an AMDAHL 470/V71.
a paired t-test randomization test with 16 subjects takes less than 10 seconds of CPU time and
the entire SAS job costs about $5.00. That is
not an unreasonable amount of money. However.
adding one more subject would almost double the
cost, adding 2 more would almost quadruple it.
etc.
Equivalent Statistics
In my conceptual derivations of these randomization tests, as well as in my examples, as
well as in my SAS procedures, I have been using
test statistics (e.g., t, F, r) for criterion
purposes. I have discussed the test in which
the appropriate test statistic is calculated for
each and every permutation of the data.
Fortunately, we may use sampling principles
to address this problem. Green (1977). Edginton (1959) and others have shown that when the
total number of permutations for a randomization
test is too large to enumerate completely, a
sufficiently large random sample from this population will produce p-values which are very close
to the true p-values.
It is often not strictly necessary to calculate the test statistic when performing randomization tests, as there is often a monotonically
related simpler statistic. For example, when
performing a randomization paired-t-test, please
notice that the t-values are monotonic with the
mean difference. As the mean difference increases, so does the t-value. This means that if we
were to perform the randomization test using the
mean as the test statistic, we would achieve identical results. Exactly the same number of means
would exceed the observed mean as t-values exceeded the observed t-value.
For example, consider the following situation. We have collected data on two different
neuroendocrine blood levels and we wish to perform a randomization test of the correlation coefficient. We have 25 subjects. There are approximately 1.55112 x 10 25 (25:) possible ways to
arrange these data. each with an associated correlation coefficient. A random sample of 1000
or even 10,000 from this population is enough to
assure the accuracy of the p-value to 3 or 4
decimal places. Most of the time, that is suffici ent.
Random Assignment
It has been alleged that randomization test
For another example, when performing a randomization test of a correlation coefficient. I
calculated the correlation coefficient for each
n~ permutations. The denominator of the correla-
822
Ha 11, 68-86.
coefficient remained constant. however. as the
denominator only involves the variable variances
which are unchanged no matter how the data are
paired. It is only the crossproduct term (the
numerator) of a correlation coefficient that
changes with each permutation. Thus, if ] had
performed a randomization test using the sum-ofproducts rather than the correlation coefficient.
the p-value would have been the same.
The advantage of using such an equivalent
statistic when it is simpler and easier to calculate is efficiency. We may be able to cut a significant amount of CPU time from our program
runs. However, there are in my view a number
of disadvantages to such an approach. First of
all, it obscures what we are doing. The examples
would not have been as clear if I had to explain
why I waS ~sing such a statistic. Second. I feel
the saving in CPU time is minute and unimportant.
The bulk of expense in such a program involves
the manipulation of the data. not the calculation
of the test statistic. Even if we were able to
cut the time required to calculate the test statistic in half, we probably wouldn't decrease the
cost of running the program by more than 5 or 10
percent. Finally, there may well be times when
one wishes to examine the actual test statistics
(as I have done in this paper). It is nice in
such cases if the program calculates them.
REFERENCES
2.
Edgington, E.S. Validity of randomization
tests for one-subject deSigns. Journal of
Educational Statistics, 1980b, ~, 235-251.
3.
Basu, D. Randomization analysis of experimental data: the Fisher randomization test.
Journal of the American Statistical Association, 1980, 75, 575-582.
4.
Edgington, E.S. Approximate randomization
tests. Journal of Psychology, 1969, 2£.
143-149.
5.
Edgington~ E.S.
Statistical inference and
nonrandom samples. Psychological B!!lletin,
1966, 66, 485-487.
6.
Green, B.F. A practical interactive program
for randomization tests of location. The
American Statistician, 1977, R, 37-38-.-
7.
Bradley, J. V. Distribution-free statistical
tests. Englewood Cliffs~ N.J.: Prent,ce
9.
Pittman. E.J.G. Significance tests which
may be applied to samples from any populations. II. the correlation coefficient test.
Journal of the Royal Statistical Society,
(Series B). 1937b, ±, 225-232.
10.
Pittman, E.J.G. Significance tests which
may be applied to samp1es from any populations. III, the analysis of variance test.
Biometricka, 193~, 29, 322-335.
11.
Fisher. R.A. The deSign of experiments
(Nineth Edition). New York: Hafner, 1971.
12.
Hays, W.L. Statistics. New York: Holt,
Rinehart, and Winston. 1963. 598-601.
I would like to thank Ml~ Chris G. Riley for her
valuable assistance with this paper.
I have written three SAS procedures which
are capable of performing the analyses described
in this paper. I am at present waiting for information on what I may need to do to make the procedures usable under portable SAS. When I do so,
and if SAS Institute wishes to, the procedures
will be distributed as part of the SAS Supplementa 1 Library.
Edgington, E.S. Randomization tests.
York: Marcel Dekker, 1980.
Pittman, E.J.G. Significance tests which
may be applied to samples from any populations. Journal of the Royal Statistical
Society, (Series B), 1937a, ±, 119-130.
Acknowledgment
SAS Procedures
1.
8.
New
823
l
f
t
i
Table 3
Two Group t-test Example
Table 1
Group 1 Group 2
Paired t-test Example
Group
Pair
1
2
Difference
5
----------------------1
2
3
4
5
t
2
3
5
5
6
1
2
4
3
3
2
4
6
1
3
7
1
1
1
2
3
0.00, P < 1. 00
t
= 4.00, p < 0.0161
Table 4
Two Group t-test Example
All Possible Permutations (35)
Paired t-test Example
All Possible Permutations
(32)
of Difference Scores
1
1
1
2
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 -1 -1 -1 -1
2 -2 -2 2 2 -2 -2
3 -3 3 -3 3 -3 3 -3
1 1 1 1 1 1 1 1
-1 -1 -1 -1 -1 -1 -1 -1
1 1 1 1 -1 -1 -1 -1
2 2 -2 -2 2 2 -2 -2
3 -3 3 -3 3 -3 3 -3
-1 -1 -1 -1 -1 -1 -1 -1
1 1 1 1 1 1 1 1
1 1 1 1 -1 -1 -1 -1
2 2 -2 -2 2 2 -2 -2
3 -3 3 -3 3 -3 3 -3
t-values
4.00 1.81 1.81 1.00
0.46 0.00 0.00 -0.46
1.00 0.46 0.46 0.00
-0.46 -1.00 -1.00 -1.81
1.81 1.00 1.00 0.46
0.00 -0.46 -0.46 -1.00
0.46 0.00 0.00 -0.46
-1.00 -1.81 -1.81 -4.00
=
4.00,
3
3
3
3
3
3
3
5
5
5
5
7
7
7
7
2
4
6
2
4
6
2
7
7
7
5
5
5
4
4
2
2
4
2
2
6
6
6
4
6
6
4
0.00
-2.33
-1.07
-0.32
-1.07
-0.32
0.32
8
9
10
11
12
13
14
1
1
1
1
1
1
1
3 2
3 2
3 4
5 7
5 7
5 7
5 2
4
6
6
2
7
7
7
4
2
2
7
6
6
4
5
5
5
3
3
3
3
-3.81
-1.58
-0.67
-0.32
0.32
1.07
-1.58
15
16
17
19
20
21
1 5
1 5
1 7
1 7
1 7
1 2
3 5
2
4
2
2
4
4
2
6
6
4
6
6
6
6
3
3
3
3
3
1
7
5
5
5
5
4
2
6
4
2
7
6
-0.67
0.00
-0.67
0.00
0.67
-1.07
0.32
22
23
24
25
26
27
28
3 5 7
3 5 7
3 5 2
3 5 2
3 5 4
3 7 2
3 7 2
4
6
4
6
6
4
6
1
1
1
1
1
1
1
2
2
7
7
7
5
5
6
4
6
4
4
6
4
1.07
2.33
-0.67
0.00
0.67
0.00
0.67
29
30
31
32
3
3
5
5
5
5
7
7
2
7
7
7
2
2
6
6
4
6
6
6
6
152
157
1 3 6
1 3 4
1 3 2
137
135
1.58
-0.32
0.67
1.58
3.87
0.32
1. 07
= 0.00,
p < 1.00
34
35
t
t
824
t
1
1
1
1
1
1
1
33
P < .0625
Group 2
1
2
3
4
5
6
7
18
-1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1
1 1 1 1 -1 -1 -1 -1
2 2 -2 -2 2 2 -2 -2
3 -3 3 -3 3 -3 3 -3
t
Group 1
Permutation
Table 2
4
4
2
2
4
4
4
4
4
2
6
6
4
6
3 7 4
Table 5
Table 7
Correlation Example
Data
one-way ANOVA Example
Data
Observation
Variable 1
1
2
2
3
7
9
3
4
r
Variable 2
Observation
1
2
3
1
3
2
5
= 0.72,
p < .27
Group
Value
1
2
1
2
2
3
3
4
5
3
5
F
9.00, P < .10
4
Table 8
One-way ANOVA Example
All Possible (30) Combinations
Combination
Group
F
-------------------------------1
2
3
------------1
2
3
4
Table 6
Correlation Example
All Possible Permutations of Variable 2
1
3
2
5
3
1
2
5
3
2
1
5
0.72
0.61
5
2
3
1
2
5
3
1
0.37
3
5
2
1
2
3
1
5
0.35
5
3
2
1
2
1
3
5
0.90
2
3
5
1
,,,
,
,f
0.72
1
5
3
2
i•
2
1
5
3
2
5
1
3
5
2
1
3
5
1
2
3
0.66 -0.28 -0.46 -0.22
5
1
3
2
-0.10 -0.34
3
1
5
2
0.25
1
3
5
2
0.37
5
3
1
1
0.81
r
f;
>.:
l.'
3
4
5
4
5
5
4
3
3
2
2
2
5
5
4
5
4
3
9.00
1.00
1.00
1.00
1.50
9.00
2
2
1
1
1
3
3
4
3
4
5
4
5
5
4
3
3
1
1
1
5
5
4
5
4
3
3.00
0.54
0.18
0.18
0.54
3.00
3
3
3
3
3
3
1
1
1
2
2
4
2
4
5
4
5
5
4
2
2
1
1
1
5
5
4
5
4
2
9.00
0.11
0.00
0.00
0.11
9.00
19
20
21
22
23
24
4
4
4
4
4
4
1
1
1
2
2
3
2
3
5
3
5
5
3
2
2
1
1
1
5
5
3
5
3
2
3.00
0.54
0.18
0.18
0.54
3.00
25
26
27
28
29
3D
5
5
5
5
5
5
1
1
1
2
2
3
2
3
4
3
4
4
3
2
2
1
1
1
4
4
3
4
9.00
1.50
1.00
1.00
1.50
9.00
6
7
8
9
10
11
12
0.96
3
2
5
1
2
2
2
2
13
14
15
16
17
18
1
5
2
3
0.01
3
5
1
1
0.69
R = 0.72, P < .29
F
,f'
,I
2
2
2
3
3
4
5
1
2
3
5
-0.69 -0.52 -0.81 -0.93 -0.04 -0.10
1
2
5
3
1
1
1
1
1
1
825
9.00, P < .17
3
2
Table 9
Table 10
Fisher Exact Test Example
Fisher Exact Test Example
Data
a
b
a .+ b
c
d
c + d
1
4
5
a + c
b + d
n
3
2
5
4
6
10
Table 11
Fisher Exact Test Example
More Extreme Table
I,,
f
o
5
5
4
1
5
4
6
10
826