Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
RANDOMIZATION TESTS Robert M. Hamer Medical College of V;rginia~ Virginia Commonwealth University In order to use the usual parametric statis- If the data have appropriate measurement characteristics (usually interval level continuous measurement) but an unknown or untractable distribution (in that it cannot be transformed to a tractable one), then use of a nonparametric test is wasteful. Nonparametric tests are wasteful in this situation because most nonparametric tests not only ignore the distributional characteristics of such data. but also ignore the measurement characteristics. When the data are interval or ratio level, this results in discarding infonnation. tics (the kind that most of us were taught first, and the kind that most of us use most of the time), two related sets of assumptions regarding the data must be satisfied. First, the data must meet the measurement characteristics assump- tions (measurement level and type) (type means continuous vs.discrete). Most common statistical techniques require that the data be at least interval level and be continuous. Second, the data must satisfy the distributional assumptions of the statistical technique. Most common, usual, parametric statistical techniques require that the data be normally distributed or amenable to a transformation which will make them normally distributed. (Many other statistical techniques have other assumptions. specific to the techniques themselves. which I will not discuss here.) When these conditions are not satisfied, a data analyst has three choices: (l)perform the test anyway, (2) transform the data, and (3) find another test. For example, suppose an experimentor is interested in the relationship between two variables. x and y. (All subjects have been measured on both x and y.) Suppose the two measures are interval level measures (that is. they are numbers with which one can legiUmately perform arithmetic), but upon checking the data, thE data analyst discovers that the data are not distributed according to any recognizable or tractable distribution. If the data analyst chooses the first alternative, he or she must be satisfied that the statistical technique is robust to such violations. that is, that the technique performs "reasonably well ' when the assumptions are violated in the manner specified. If the data analyst chooses the second technique, he or she must be satisfied that the transformed data are tractable for analysis. If the data analyst chooses the third alternative, he or she often then has a variety of nonparametric tests from which to choose. Under such circumstances, the correlation coefficient would have been the appropriate statistic if the data were distributed according to a bivariate normal distribution. The data analyst could transform the data to ranks and use a Spearman rank correlation, but he or she would then lose all the information in the data except the ranks. ! Since the correlation coefficient is a least squares statistic9 it does not depend upon any distributional assumptions. Regardless of the distribution of the data (so long as the mean and variance are finite), a correlation coefficient has certain properties: a range between -1.00 (perfect negative linear relationship) and +1.00 (perfect positive linear relationship), and a square that is the proportion of shared variance. However, when the data are not distributed according to a bivariate normal distribution, the Fisher Z transformation of a correlation coefficient is not an appropriate test. In fact, the distribution of the correlation coefficient in that situation is unknown. If the measurement characteristics are not met, a usual approach is to transform the data to ranks, and use a rank-order statistic. Many common "nonparametric" statistics, are rank order statistics (e.g., Wilcoxon tests, Spearman and the various Kendall correlations, Mann-Whitney "U" test, rank-order ANOVA, etc.). When we make the rank order transformation we do two things: affirm the measurement characteristics of the data as ordinal, and produce a new set of data with a known distribution (no matter what the original distribution was, the ranks have a new known distribution for the appropriate test). Randomization Tests Thus. if the measurement characteristics are not met and, in addition, the distribution is unknown. the transformation to ranks will correct not only the measurement problem but also the distributional problem. There exists a class of tests for exactly' this type of situation, and this class of tests is called randomization tests. Randomization tests do not make distributional assumptions regarding the data. but use the information in the data up to the 1imits of the measurement characteristics. If the data have appropriate measurement characteristics but do not meet the distributional assumptions, often a transformation (e.g. log. arcsine. etc.) will make the data more normal. For example, a log transformation will make a distribution Which is the exponential of a normal distribution normal. One may look to Fisher for the first conceptualization of a randomization test. In "The Design of Experiments". in 1935, Fisher outl ined and explained a randomization test which I shall cover later in this paper. Fisher (1971) stated, 818 In these discussions. it seems to have escaped recognition that the physical act of randomization, which as has been shown, is necessary for the validity of any test of significance, affords the means, in respect of any particular body of data, of examining the wider hypothesis in which no normality of distribution is implied. The arithmetical procedure of such an examination is tedious, and we shall only give the results of its application in order to show the possibility of an independent check on the more expeditious methods in common use (Fisher, 1971, p.4S). have tried to claim wonderful properties for randomization tests and indeed wonder why we would ever want to use old-fashioned "parametriC" techniques. Some have made extravagent claims for them and state that they make random assignment unnecessary. Some allege that they are the perfect way to analyze one subject designs. Basu (1980) and the accompanying comments contain an excellent discussion of these aspects of randomization tests. I shall avoid all controversies. In this I present some examples of a class of statistical techniques which have a long and distinguished theoretical history. They have not had much of a practical history, because without a computer, they were impractical. I shall also present SAS procedures which may be used to perform randomization-test analyses for all situations discussed in this paper. paper~ Fisher then went on to use the data which he had previously used in a paired-t-test as an exampl e of a randomization test for the hypothesis of identical populations. His reasoning was as follows. Fisher's example used 15 pairs of plant seeds, and within each pair, the seeds were assigned randomly to 2 groups. One group was cross-fertilized and one was self-fertilized. He then calculated (after growth) 15 differences in height (one per pair) and proposed to test the hypothesis that there was a statistically significant difference between the two methods of fertilization. The Paired T-Test The first randomization test I will discuss is the paired t-test. First, I will describe the situation~ and calculate the usual t-statistic. Then, 1 will calculate a p-value for that t-statistic via the randomization test method. Fisher's reasoning was that if the two series of seeds were random samples from identical populations, then the 15 observed within-pair differences that occurred would each have occurred with equal frequency, each with a positive or negative sign. We may sum the differences we actuallY1 obtained, and we may "ask how many of the 2 S numbers, which may be formed by giving each component alternatively a positive and a negative sign, e~ceed this value. Since ex hypothesi each of these 215 combinations will occur by chance with equal frequency, a knowledge of how many of them are equal to or greater than the value actually observed affords a direct arithmetical test of the significance of this value" (p. 46). Suppose we have n pairs of subjects (for example, twins). Suppose we randomly assign one subject from each pair to each of two groups. Suppose we then perform some experimental manipulation On one group, and we now wish to compare the groups. The usual test would be a matchedpair t-test. We would calculate difference scores within each pair and perform a one group t-test on the mean of the differences testing the null hypothesis Ho: "d = O. Suppose that for distributional reasons we were precluded from performing the above test. Suppose, for example, that our dependent variable was a waiting time. or better still. a correlation between repeated pairs of observations on each subject. In other words, suppose that while the measurement characteristics of the dependent variable are of sufficient metricity such that we can do arithmetic with it (interval level, continuous), the distribution of the dependent var1able is completely unknown. A randomization test would then be appropriate. In addition~ there is a randomization test in wide use today: Fisher's Exact Test for 2 * 2 tables. Most of us were taught the test early in our first course, and most of us use it. Until computers were developed, it was cumbersome. tedious, and not widely used. It is in fact a randomization test, and today, most statistical packages (including SAS) perform the Fisher Exact Test. The rationale behind the randomization test for this situation is as follows: We have randomly assigned one subject from each pair to a group and the other subject from each pair to the other group. If the experimental manipulation made no difference (the null hypothesis) then there should be no difference (except for random error) between the scores made by subjects in one group and the scores made by subjects in the other group. Over the years, other researchers developed randomization tests for various situations. As mentioned, Fisher developed the '1Exact Test" for the 2 .. 2 table. In 1937, Pittman developed a randomization test for the correlation coefficient. (Pittman, 1937a, 1937b). (I also developed this test a couple of years ago and was terribly disappointed to discover someone else had preceeded me). Pittman (1938) also developed a randomization test for ANOVA. These tests remained little more than elegant mathematical curiosities until recently~ when the use of high speed computers made their use practical. If there is no difference (except for random error) between the groups, then it should not matter to which group a subject is assigned. In other words, switching the scores within a pair should make no difference (on the average) in the group sc-oreS. There are 2n pass i b1e permuta ti ons Recently, randomization tests have been the center of some controversy. Some researchers 819 above test. The data are still interval level and continuous (in other words. the number'S still possess metric properties). We would still like to use the information contained in these numbers~ even though they do not meet the distributional requirements of at-test. of the data which we can form by switching with in pairs. We view these 2n possible permutations as the entire population from which our sample was drawn. We may calculate a t-value for each of these 2n samples. The probability of drawing a sample at random from this population with a t-value as large or larger is simply the number of t-va1ues as large or larger divided by 2n. Under the null hypothesis we view our sample as having been randomly drawn from this population. If so, then our observed t-value should be relatively close to the center of the population distribution of t-values. If the t-value is close to the extremes, it indicates a low probability that our observed sample was in fact randomly drawn from this population. The reasoning behind the randomization test for this situation is as follows: Assume that (under the null hypothesis) there are nc group differences. Then it should make no difference to which group a subject belongs. If it makes no difference to which group a subject belongs, we should be able to switch subjects from one group to the other without affecting (in the long run) the t-value. For n = n1 + n subjects 2 in two groups as described. there are n~/nl!n2! possible combinations of the data with appropriate group sizes. We view these n!/nl!n 2!samples as the entire population of samples from which our one sample was presumably randomly drawn (under the null hypothesis). For example, suppose we have 5 pairs of rats. Suppose we randomly assign one rat from each pair to group 1 and the other to group 2. Suppose we feed the groups different diets and at the end of some time period make some measurement. We might have the data described in Table 1. The classic way of testing the null hypothesis of no difference would be to calculate difference scores and perform a one group t-test On the difference scores. Each sample will possess a t-value. All these n:/n !n : t-va1ues have an equal probability of belng 2chosen under the null hypothesis. If our observed sample is one with at-value close to the center of the- distribution, then we would fail to reject the null hypothesis. If the observed t-va1ue is close to the extremes of the distribution, we would reject the null hypothesis. We calculate the p-value by counting the number of times our observed t-value is equaled or exceeded in our n!/nl!n2! samples and dividing this number by n!/nl~n2!' Using a randomization test to obtain the p-values proceeds in exactly the same manner up to the pOint at which we would look up (or calculate) a p-value. We form difference scores, calculate a mean difference score, and calculate a t-value. Then we permute the data all possible ways. For 4 pairs, there are 32 possible permutations. Table 2 contains all 32 of these possible permutations. Notice that the permutations differ from each other only in terms of signs. For example, suppose we have 7 subjects, and we randomly assign them to two groups. group 1 and group 2. Suppose nl = 3 and n2 = 4 (n l + n2 = n). Suppose we now subject the groups to two different treatments, and we wish to test the significance of the t-value via the randomization method. For each permutation, we calculate at-value These 32 t-values comprise (under the null hypothesis) the entire population from which our tvalue was drawn. Our observed t-value is 4.00. If we rank the 32 t-values in absolute value (we will do a two-tailed test), we see that our value is the highest. Thus, the probability of obtaining this or any more extreme t-va1ues is 2/32 for a p-value of .0625 (for a two-tailed test). Table 3 contains the data. as well as our observed t-va1ue and a p-value calculated in the usual manner from a t-distribution. Table 4 contains all n!/nl!n ! = 7!/4!3! = 35 combinations; all 35 ways i~ which we may divide the 7 observations into two groups, with 3 observations in one group and 4 observations in another. This table also contains a t-value calculated for each combination. As we can see, out of 35 t-values, ours is equalled or exceeded in absolute value by all. Thus, our p-value is 1.00. The Two Group T-Test The next randomization test I will discuss is the two group independent t-test. Suppose we have n subjects and we randomly divide the subjects into two groups, with n subjects in 1 group 1 and n2 subjects in group 2 (n l + n2 = n). We then perform some experimental manipulation on one group (or a different experimental manipulation on each of the two groups). This is the usual two-group independent t-test situation. If the data are continuous and at least interval level, and if the distribution of the data is normal with homogeneous variance. then we would calculate a t to test the null hypothesis Ho: "1 = "2 against Ha: "1 1 "2' The Significance of a Correlation The third randomization I would like to discuss is the significance test for a Pearson product-moment correlation coefficient (hereinafter referred to as Simply the correlation). This test works equally well for all rank-order Correlations. First, I will describe the situation and calculate the usual correlation coefficient. Then, I will calculate a p-value for that Suppose that for distributional reasons. however, we were precluded from performing the 820 following assumptions: normally distributed, interval level, and continuous, and equal group variances. Suppose it is obvious from the data that the data are not normally distributed, the groups differ greatly in size, and the variances are unequal. This is a situation to which the ANOVA technique is not robust. In this case we might perform a randomization test. correlation coefficient via the randomization method. Suppose we have n subjects. each of whom we have measured on two variables (x and y). We wish to assess the linear relationship of the two variables. The usual way to do this if the data are interval level, continuous, and bivariate normal is to calculate a correlation coefficient, perform a Fisher Z transformation; and test the transformed coefficient using the normal distribution. Suppose that the distribution of the variables is not bivariate normal. Then, the use of the Fisher Z transformation is improper. in that the resulting transformed statistic is not guaranteed to be asymPtotically normally distributed. One can think of the randomization test in one way ANOVA as a straight-forward extension (conceptually) of the two-group independent ttest. We have n, + n2 + '" + nk = n subjects, who have been randomly assigned to the k groups. The k groups are manipulated differentially and an F-statistic is calculated. Under the null hypothesis of no treatment effects, it should make no difference to which group a subject ;s aSSigned. There are n!/nl!n2~ ... nk! possible \'/ays to assigning the subjects to the groups under the constraint that the group sizes remain the same. Under the null hypothesis. the observed F-statistic was randomly drawn from the population of F-statistics that comprise this population. The probability of finding an Fstatistic as large or larger is thus the number Of F-statistics as large or larger divided by n!/nl!n2!· .. nk~' This we use as the p-value. If we had a reasonable idea of the manner in which the variables differed from bivariate normality! we could transform the data to approximate normality. but sometimes this is just not possible. The reasoning behind a randomization test of a correlation coefficient is as follows: We can calculate the correlation between two variables. Under a null hypothesis of no relationship between the two variables (Ho: p=O) it should make no difference whiCh x-variable is paired with which y-variable. For n observations, there are n: permutations; we can calculate a correlation coefficient. From this entire population of correlation coeffiCients, each has an equal probability of being chosen under the null hypothesis. If we arranged them in order of magnitude, we would expect ours to come fram around the center (under a null hypothesis of no relationship). If there were only a few correlations as high or higher. we would take this as evidence that it is unlikely that our correlation was randomly drawn from this population of correlations. In this example, there will be a couple of aspects which are somewhat "unrealistic". I am using an exceptionally small n because otherwise the number of permutations becomes too large to list. For the same reason, one of the three groups has only one observation. I am not saying it would be a good idea to use a randomization test with so few observations. but it is the only way to make a manageable presentation. Suppose we have five subjects, and three treatments. We randomly assign the five subjects to the 3 groups (n ; 1. n = 2. n = 2) and perform our experimenlal rnani~ulation~. There are 5!/2!2!1! ; 60/2 = 30 possible ways 5 subjects may be assigned to the 3 groups with the specified group sizes. Table 7 contains these data. Table 8 contains the 30 possible groupings of the data. If we assume (under the null hypothesis) that each of these 30 groupings had an equal chance of being selected. then the probability of obtaining a,result with an F as large or larger than our observed one is the number of Fstatistics greater than or equal to the observed one divided by the number of combinations (30). Our observed F-statistic was 9.00. The number of F-statistics which equal or exceed it is 5. Therefore, the p-value is .17. Consider an example. Table 5 contains data on four observations, two variables. The correlation is .72. There are 4: ways in which the xvariables and y-variables can be paired. These 24 pairings are given in Table 6. The correlation between x and -y has been calculated for each sample. As you can see, our correlation was the highest. Thus, -the probability of obtaining a correlation this high or higher is 4/24 = .16 for a One tailed test. and 7/24 = .69 for a two tailed test. One-Way ANOVA The next randomization test I will discuss is that appropriate to a one-way ANOVA, First. I will discuss the situation. Next I will discuss the reasoning behind the test, and finally, I will present an example. The Fisher Exact Test One of the most common randomization tests used, one which all statisticians have learned, is known as the Fisher Exact Test. It is an exact test for use with two-by-two contingency tables. The situation is the one-way ANOVA situation. Suppose we have npatients randomly assigned to k groups with eel' sizes of nl , n , n , 2 3 ... nk, and n1 + n2 + '" + n :: n. We would usually perform a one-way ANOV~ to test the hypothesis HO"l ="2; "'"k if the data met the The following example is taken from Hays (1963, p.S98-601). Suppose that n objects are arranged into the two-by-two table shown in 821 methods make pre-experimental random assignment unnecessary (Edgington, 1966). Without random assignment, causal inference is problematic. Any treatment effects are confounded with the nonrandom assignment. Table 9. If we consider the marginals of the table a5 fixed, we can enumerate all the tables that could be formed with these marginals. For each table, we can calculate some measure of association. Hays stated, "If one finds the probability of the arrangement actually obtained, as well as every other arrangement giving as much or more evidence for association, then one can test the hypothesis that the obtained result is purely a product of chance by taking the probability as the significance level." (p.599). Single Subject Designs It has been alleged (Edgington,1980a, 1980b). that randomization tests are appropriate for "one subject designs. It is my opinion that randomization tests are as appropriate for such designs as ;s anything else - not very. Of course, the major problem is that one cannot generalize beyond the subject, and an experimental population limited to one subject is in general uninteresting. 11 For example (again from Hays): suppose we observe 10 subjects and they fall into the table given in Table 10. The probability that this result might have occurred by chance alone is (4:6:5:5:)/(10:1:4:3:2:) = .238. Of all other tables which might be constructed with these marginals, only one shows more evidence of association (Table 11). The probability of this occurring by chance alone is (4!6!5!5~)/(10!0!5!4!1~) = .024. Thus, the probability of the observed result, or one more extreme, is .238 + .024'" .262. Thus, we may take .262 as the p-value for an exact test of no association. Randomization Tests and Rank-Order Statistics If we first transform a set of data to ranks and then apply a randomization test, we lose some information (the metric information in the data) and gain in practicality (Bradley, 1968, p.87). \1e throw the metric information away when we transform to ranks. However, since all sets of scores monotonically equivalent to our observed data will produce the same ranks, we gain in practicality in that for a given set of ranks the test is always the same. Many lIusual nonparametric tests are derived in exactly that manner: the data are transformed to ranks and the randomization test is simplified because the sample space is now able to be tabled. Sampling from the Distribution For most randomization tests, the maximum sample size which it is possible to use is relatively small. For example, for the paired t-test, 16 pairs produces 65,536 possible permutations. For the correlation randomization test, 9 subjects require 362.880 permutations. While these sorts of numbers requi re the computer system to take some time, they are not unreasonable. li Some familiar tests which are in fact rankrandomization tests are Wilcoxon's signed-rank tests, Wilcoxon's rank-sum tests. the SiegelTukey test, Friedman rank-ANOVA. Kruskal-Wa11is test, etc. For example, on our system(an AMDAHL 470/V71. a paired t-test randomization test with 16 subjects takes less than 10 seconds of CPU time and the entire SAS job costs about $5.00. That is not an unreasonable amount of money. However. adding one more subject would almost double the cost, adding 2 more would almost quadruple it. etc. Equivalent Statistics In my conceptual derivations of these randomization tests, as well as in my examples, as well as in my SAS procedures, I have been using test statistics (e.g., t, F, r) for criterion purposes. I have discussed the test in which the appropriate test statistic is calculated for each and every permutation of the data. Fortunately, we may use sampling principles to address this problem. Green (1977). Edginton (1959) and others have shown that when the total number of permutations for a randomization test is too large to enumerate completely, a sufficiently large random sample from this population will produce p-values which are very close to the true p-values. It is often not strictly necessary to calculate the test statistic when performing randomization tests, as there is often a monotonically related simpler statistic. For example, when performing a randomization paired-t-test, please notice that the t-values are monotonic with the mean difference. As the mean difference increases, so does the t-value. This means that if we were to perform the randomization test using the mean as the test statistic, we would achieve identical results. Exactly the same number of means would exceed the observed mean as t-values exceeded the observed t-value. For example, consider the following situation. We have collected data on two different neuroendocrine blood levels and we wish to perform a randomization test of the correlation coefficient. We have 25 subjects. There are approximately 1.55112 x 10 25 (25:) possible ways to arrange these data. each with an associated correlation coefficient. A random sample of 1000 or even 10,000 from this population is enough to assure the accuracy of the p-value to 3 or 4 decimal places. Most of the time, that is suffici ent. Random Assignment It has been alleged that randomization test For another example, when performing a randomization test of a correlation coefficient. I calculated the correlation coefficient for each n~ permutations. The denominator of the correla- 822 Ha 11, 68-86. coefficient remained constant. however. as the denominator only involves the variable variances which are unchanged no matter how the data are paired. It is only the crossproduct term (the numerator) of a correlation coefficient that changes with each permutation. Thus, if ] had performed a randomization test using the sum-ofproducts rather than the correlation coefficient. the p-value would have been the same. The advantage of using such an equivalent statistic when it is simpler and easier to calculate is efficiency. We may be able to cut a significant amount of CPU time from our program runs. However, there are in my view a number of disadvantages to such an approach. First of all, it obscures what we are doing. The examples would not have been as clear if I had to explain why I waS ~sing such a statistic. Second. I feel the saving in CPU time is minute and unimportant. The bulk of expense in such a program involves the manipulation of the data. not the calculation of the test statistic. Even if we were able to cut the time required to calculate the test statistic in half, we probably wouldn't decrease the cost of running the program by more than 5 or 10 percent. Finally, there may well be times when one wishes to examine the actual test statistics (as I have done in this paper). It is nice in such cases if the program calculates them. REFERENCES 2. Edgington, E.S. Validity of randomization tests for one-subject deSigns. Journal of Educational Statistics, 1980b, ~, 235-251. 3. Basu, D. Randomization analysis of experimental data: the Fisher randomization test. Journal of the American Statistical Association, 1980, 75, 575-582. 4. Edgington, E.S. Approximate randomization tests. Journal of Psychology, 1969, 2£. 143-149. 5. Edgington~ E.S. Statistical inference and nonrandom samples. Psychological B!!lletin, 1966, 66, 485-487. 6. Green, B.F. A practical interactive program for randomization tests of location. The American Statistician, 1977, R, 37-38-.- 7. Bradley, J. V. Distribution-free statistical tests. Englewood Cliffs~ N.J.: Prent,ce 9. Pittman. E.J.G. Significance tests which may be applied to samples from any populations. II. the correlation coefficient test. Journal of the Royal Statistical Society, (Series B). 1937b, ±, 225-232. 10. Pittman, E.J.G. Significance tests which may be applied to samp1es from any populations. III, the analysis of variance test. Biometricka, 193~, 29, 322-335. 11. Fisher. R.A. The deSign of experiments (Nineth Edition). New York: Hafner, 1971. 12. Hays, W.L. Statistics. New York: Holt, Rinehart, and Winston. 1963. 598-601. I would like to thank Ml~ Chris G. Riley for her valuable assistance with this paper. I have written three SAS procedures which are capable of performing the analyses described in this paper. I am at present waiting for information on what I may need to do to make the procedures usable under portable SAS. When I do so, and if SAS Institute wishes to, the procedures will be distributed as part of the SAS Supplementa 1 Library. Edgington, E.S. Randomization tests. York: Marcel Dekker, 1980. Pittman, E.J.G. Significance tests which may be applied to samples from any populations. Journal of the Royal Statistical Society, (Series B), 1937a, ±, 119-130. Acknowledgment SAS Procedures 1. 8. New 823 l f t i Table 3 Two Group t-test Example Table 1 Group 1 Group 2 Paired t-test Example Group Pair 1 2 Difference 5 ----------------------1 2 3 4 5 t 2 3 5 5 6 1 2 4 3 3 2 4 6 1 3 7 1 1 1 2 3 0.00, P < 1. 00 t = 4.00, p < 0.0161 Table 4 Two Group t-test Example All Possible Permutations (35) Paired t-test Example All Possible Permutations (32) of Difference Scores 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 -1 -1 -1 -1 2 -2 -2 2 2 -2 -2 3 -3 3 -3 3 -3 3 -3 1 1 1 1 1 1 1 1 -1 -1 -1 -1 -1 -1 -1 -1 1 1 1 1 -1 -1 -1 -1 2 2 -2 -2 2 2 -2 -2 3 -3 3 -3 3 -3 3 -3 -1 -1 -1 -1 -1 -1 -1 -1 1 1 1 1 1 1 1 1 1 1 1 1 -1 -1 -1 -1 2 2 -2 -2 2 2 -2 -2 3 -3 3 -3 3 -3 3 -3 t-values 4.00 1.81 1.81 1.00 0.46 0.00 0.00 -0.46 1.00 0.46 0.46 0.00 -0.46 -1.00 -1.00 -1.81 1.81 1.00 1.00 0.46 0.00 -0.46 -0.46 -1.00 0.46 0.00 0.00 -0.46 -1.00 -1.81 -1.81 -4.00 = 4.00, 3 3 3 3 3 3 3 5 5 5 5 7 7 7 7 2 4 6 2 4 6 2 7 7 7 5 5 5 4 4 2 2 4 2 2 6 6 6 4 6 6 4 0.00 -2.33 -1.07 -0.32 -1.07 -0.32 0.32 8 9 10 11 12 13 14 1 1 1 1 1 1 1 3 2 3 2 3 4 5 7 5 7 5 7 5 2 4 6 6 2 7 7 7 4 2 2 7 6 6 4 5 5 5 3 3 3 3 -3.81 -1.58 -0.67 -0.32 0.32 1.07 -1.58 15 16 17 19 20 21 1 5 1 5 1 7 1 7 1 7 1 2 3 5 2 4 2 2 4 4 2 6 6 4 6 6 6 6 3 3 3 3 3 1 7 5 5 5 5 4 2 6 4 2 7 6 -0.67 0.00 -0.67 0.00 0.67 -1.07 0.32 22 23 24 25 26 27 28 3 5 7 3 5 7 3 5 2 3 5 2 3 5 4 3 7 2 3 7 2 4 6 4 6 6 4 6 1 1 1 1 1 1 1 2 2 7 7 7 5 5 6 4 6 4 4 6 4 1.07 2.33 -0.67 0.00 0.67 0.00 0.67 29 30 31 32 3 3 5 5 5 5 7 7 2 7 7 7 2 2 6 6 4 6 6 6 6 152 157 1 3 6 1 3 4 1 3 2 137 135 1.58 -0.32 0.67 1.58 3.87 0.32 1. 07 = 0.00, p < 1.00 34 35 t t 824 t 1 1 1 1 1 1 1 33 P < .0625 Group 2 1 2 3 4 5 6 7 18 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 1 1 1 -1 -1 -1 -1 2 2 -2 -2 2 2 -2 -2 3 -3 3 -3 3 -3 3 -3 t Group 1 Permutation Table 2 4 4 2 2 4 4 4 4 4 2 6 6 4 6 3 7 4 Table 5 Table 7 Correlation Example Data one-way ANOVA Example Data Observation Variable 1 1 2 2 3 7 9 3 4 r Variable 2 Observation 1 2 3 1 3 2 5 = 0.72, p < .27 Group Value 1 2 1 2 2 3 3 4 5 3 5 F 9.00, P < .10 4 Table 8 One-way ANOVA Example All Possible (30) Combinations Combination Group F -------------------------------1 2 3 ------------1 2 3 4 Table 6 Correlation Example All Possible Permutations of Variable 2 1 3 2 5 3 1 2 5 3 2 1 5 0.72 0.61 5 2 3 1 2 5 3 1 0.37 3 5 2 1 2 3 1 5 0.35 5 3 2 1 2 1 3 5 0.90 2 3 5 1 ,,, , ,f 0.72 1 5 3 2 i• 2 1 5 3 2 5 1 3 5 2 1 3 5 1 2 3 0.66 -0.28 -0.46 -0.22 5 1 3 2 -0.10 -0.34 3 1 5 2 0.25 1 3 5 2 0.37 5 3 1 1 0.81 r f; >.: l.' 3 4 5 4 5 5 4 3 3 2 2 2 5 5 4 5 4 3 9.00 1.00 1.00 1.00 1.50 9.00 2 2 1 1 1 3 3 4 3 4 5 4 5 5 4 3 3 1 1 1 5 5 4 5 4 3 3.00 0.54 0.18 0.18 0.54 3.00 3 3 3 3 3 3 1 1 1 2 2 4 2 4 5 4 5 5 4 2 2 1 1 1 5 5 4 5 4 2 9.00 0.11 0.00 0.00 0.11 9.00 19 20 21 22 23 24 4 4 4 4 4 4 1 1 1 2 2 3 2 3 5 3 5 5 3 2 2 1 1 1 5 5 3 5 3 2 3.00 0.54 0.18 0.18 0.54 3.00 25 26 27 28 29 3D 5 5 5 5 5 5 1 1 1 2 2 3 2 3 4 3 4 4 3 2 2 1 1 1 4 4 3 4 9.00 1.50 1.00 1.00 1.50 9.00 6 7 8 9 10 11 12 0.96 3 2 5 1 2 2 2 2 13 14 15 16 17 18 1 5 2 3 0.01 3 5 1 1 0.69 R = 0.72, P < .29 F ,f' ,I 2 2 2 3 3 4 5 1 2 3 5 -0.69 -0.52 -0.81 -0.93 -0.04 -0.10 1 2 5 3 1 1 1 1 1 1 825 9.00, P < .17 3 2 Table 9 Table 10 Fisher Exact Test Example Fisher Exact Test Example Data a b a .+ b c d c + d 1 4 5 a + c b + d n 3 2 5 4 6 10 Table 11 Fisher Exact Test Example More Extreme Table I,, f o 5 5 4 1 5 4 6 10 826