Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Contents 1 2 3 Statistical Distributions 1.1 Introduction . . . . . . . . . . . 1.2 Some basic concepts . . . . . . 1.3 Frequency Distributions . . . . . 1.3.1 Binomial distribution . . 1.3.2 Poisson Distribution . . 1.4 The Normal Distribution . . . . 1.4.1 Mean and Variance . . . 1.4.2 Standard Normal Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 5 5 8 10 11 11 11 14 Sampling 2.1 Sample Mean and Variance . . . . . . . 2.1.1 Degrees of Freedom . . . . . . 2.2 Interval Estimation . . . . . . . . . . . 2.2.1 The Sampling Distribution of x̄ 2.3 Confidence Limits . . . . . . . . . . . . 2.4 Central Limit Theorem . . . . . . . . . 2.5 Coefficient of Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 17 18 19 19 22 25 25 Comparing Two Samples 3.1 Hypotheses . . . . . . . . . . . . . . . 3.2 One and Two-tailed tests . . . . . . . . 3.3 Unpaired t-Test . . . . . . . . . . . . . 3.3.1 Theory . . . . . . . . . . . . . 3.3.2 Example of Student’s t-test . . . 3.3.3 Checking equality of variance . 3.3.4 Calculating Student’s t . . . . . 3.4 Paired t-Test . . . . . . . . . . . . . . . 3.5 Non-parametric tests . . . . . . . . . . 3.6 The Mann-Whitney U test . . . . . . . 3.6.1 Example 1 . . . . . . . . . . . 3.6.2 Procedure for small samples . . 3.6.3 Large samples (N1 or N2 > 20) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 27 28 29 29 31 32 33 33 35 36 37 37 38 1 . . . . . . . . . . . . . . . . . . . . . . . . 3.7 Wilcoxon signed-rank test . . . . . . . . . . . . . . . . . . . . . 3.7.1 Example 2 . . . . . . . . . . . . . . . . . . . . . . . . . 3.7.2 Large Samples . . . . . . . . . . . . . . . . . . . . . . . 4 Analysis of Variance 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 4.2 Within Sample Variance and Between Sample Variance 4.2.1 Example . . . . . . . . . . . . . . . . . . . . 4.2.2 Exercise . . . . . . . . . . . . . . . . . . . . . 4.3 Comparison of Means . . . . . . . . . . . . . . . . . . 4.3.1 Fisher’s Least Significant Difference (LSD) . . 4.3.2 Tukey’s Honestly Significant Difference . . . . 4.3.3 Dunnett’s Multiple Range Test Using Minitab . 39 39 40 . . . . . . . . 43 43 43 44 48 51 51 51 52 5 Correlation and Regression 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Some useful formulae . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Product-Moment Correlation Coefficient. . . . . . . . . . . . . . 5.3.1 Correlation coefficient: Example . . . . . . . . . . . . . . 5.4 Spearman’s Rank Correlation Coefficient - rs . . . . . . . . . . . 5.4.1 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Linear Regression Analysis . . . . . . . . . . . . . . . . . . . . . 5.5.1 Regression by hand . . . . . . . . . . . . . . . . . . . . . 5.5.2 Assumptions involved in linear regression by least squares 5.5.3 Regression Example 1 . . . . . . . . . . . . . . . . . . . 5.5.4 Regression Example 2 . . . . . . . . . . . . . . . . . . . 5.5.5 Regression Example 3 . . . . . . . . . . . . . . . . . . . 5.5.6 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 55 57 57 58 61 61 62 64 65 67 70 73 74 6 Analysis of Counts in Contingency Tables 6.1 Introduction . . . . . . . . . . . . . . 6.2 Example . . . . . . . . . . . . . . . . 6.3 Small expected values . . . . . . . . . 6.4 2 × 2 Contingency Tables . . . . . . 6.4.1 Example of 2 × 2 analysis . . 6.5 Exercise 1 . . . . . . . . . . . . . . . 75 75 75 77 77 78 78 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Statistical Tables 7.1 Table 1. Critical values for the Wilcoxon signed-rank test . . . . . 7.2 Table 2a. Critical values for the Mann-Whitney U test. For twotailed test. 5% significance level. . . . . . . . . . . . . . . . . . . 7.3 Table 2b. Critical values for the Mann-Whitney U test. For twotailed test. 1% significance level . . . . . . . . . . . . . . . . . . 2 81 81 82 83 7.4 7.5 7.6 Table 2c. Critical values for the Mann-Whitney U test. For onetailed test. 5% significance level . . . . . . . . . . . . . . . . . . Table 2d. Critical values for the Mann-Whitney U test. For onetailed test. 1% significance level. . . . . . . . . . . . . . . . . . . Table 3. Critical values for Spearman’s rank correlation coefficient 3 84 85 86 4 Chapter 1 Statistical Distributions 1.1 Introduction This short course on statistics is intended to provide an understanding of distributions, sampling, errors, and significance testing. A number of standard tests used in pharmaceutical and medical science will be introduced. Emphasis will be placed on a general understanding of statistical tests and acquiring the ability to select and interpret the appropriate test for given experimental data. Whenever possible the understanding and use of statistical tests will be demonstrated and carried out, with your own data acquired in other parts of the course using a software package (M INITAB) typical of the type of package you might be expected to use in industry or a research institution. 1.2 Some basic concepts We are going to start off this course on statistics by considering a very simple experiment. The data from this experiment will be used to illustrate several fundamental statistical principles and hopefully give some idea why statistics is useful right from the start. Suppose we are interested in comparing two possible feeding regimes in laboratory rats. One is more expensive but is claimed to lead to faster growth. We would like to know whether the extra expense is justified. To investigate this we take 11 rats which have been fed on regime 1, which we shall call “cheap” and 11 rats which have been fed on “expensive”. We weigh the rats before and after a period of time and record the change in weight for each rat. The data are listed below. Cheap feeding regime. 29.59 31.69 30.50 Mean = 32.28g 35.63 5 36.08 30.55 37.81 34.47 27.32 36.94 Expensive feeding regime. 39.56 29.01 35.60 36.48 36.75 40.13 24.48 Mean = 37.24g 43.46 36.63 44.71 36.15 31.20 What do these data tell us? The first thing to do is look at the average weight increase for each feeding regime. For “cheap” rats adding all the weights up and dividing by 11 gives an average increase of 32.28g. The average increase for “expensive” rats is 37.24g. Thus the “expensive” rats showed a 4.9g increase over the “cheap” rats. So, “expensive” rats have put on more weight, but how do we know that this difference is real and not just a matter of chance? After all we would not expect the two samples to have exactly the same average weight increase. Differences in the characteristics of individual rats and differences in their living conditions will lead to some variation which has nothing to do with feeding regime. If we repeated the experiment we might get the opposite result. Looking at the two averages (or means) does not answer this question. We have to look at the data in more detail. Here are the same data summarised as box plots. Box plots show the minimum, the maximum and the median value (the cross) of a set of data. The portion of the sample enclosed by double lines includes half of the items. The median is the central value. The Box plot is a good way to get a quick visual feel for the data. 6 ----------------------( + )---------------------* ---------( + )------------------ You can see that the two samples are well separated. We would intuitively feel fairly confident that the difference in mean weight increase between the two treatments was real. Suppose, however that the box plots looked like this. -------------------------------------( + )-----------------------------------------------------------------------------( + )--------------------------------------- These two medians are the same as in the previous boxplot but the variation between rats within each treatment is much larger. In this case we might feel that the observed difference is just a chance event. This example should make it clear that it is the size of the observed effect in relation to the variation within each experimental group that is important in making a statistical inference. In the second case the difference in medians looks rather small in comparison to the large variation between rats. In fact, a statistical test on these data will not tell us whether the “expensive” food is better than the “cheap” food, what it will tell us is how likely the observed difference is on the assumption that the two feeding regimes are identical in their effect on weight increase. If the observed difference (4.9g) is very unlikely to occur by chance we can infer that the effect is real. The decision as to how unlikely an effect we are willing to accept as real is up to us (although there are commonly accepted levels, as you will see). The appropriate statistical test on the rat data tells us that the probability of the observed difference of 4.9g is 0.0085. In other words, assuming that the difference is a chance event we would expect a difference as large as 4.9 only 1 time in 118 (1/0.0085) similar experiments. This would be a very rare event so we would certainly conclude that the “expensive” regime was better. The action you take after a statistical test depends on the situation. In this case we might well decide that the small increase in weight does not justify the extra 7 expense even if we are convinced that the difference is real. These sorts of practical decisions are ultimately up to us, statistics gives an objective assessment of the available evidence which we can use to make rational decisions. The above discussion has introduced some important statistical concepts. First of all, information on the underlying biological variation was fundamental to the statistical test. This variation is often referred to as error or residual variation. An estimate of the residual variation is achieved by replicating the basic experimental unit (in this example, a rat) within each treatment. If we had only taken two rats, one for each treatment, we would have no idea whether an observed difference was due to the treatment or just error. We can only calculate a probability for the observed result because we had replicates for each treatment. You will have noticed that the probability that is calculated is for the observed result, given that the treatments are identical. This assumption represents the so called null hypothesis. All statistical tests have a null hypothesis which is adopted temporarily. If the experimental results are considered to be improbable under the null hypothesis it will be rejected in favour of an alternative hypothesis. In the example the null hypothesis is that the two feeding regimes are equivalent in terms of their effect on weight increase. This might be rejected in favour of the alternative hypothesis that the “expensive” regime is better than the “cheap” regime. Finally if, as in the example, a difference in means is considered to be too large to have occurred by chance, it is said that the difference is statistically significant. 1.3 Frequency Distributions The sample of 11 rats could be plotted on a histogram, as in fig 1.1, with suitable weight increments along the horizontal axis and number of rats in each weight class on the vertical axis. Figs 1.1 shows such histograms for larger and larger samples from the same population of rats. As the samples get larger the frequency histogram starts to take on a characteristic bell shape. Most of the rats are clustered around a mean value, with fewer and fewer rats at the extreme weights as represented by the tails of the distribution. This kind of bell shaped distribution is very common and is well represented by a mathematical model called the normal distribution. The normal curve is superimposed on the frequency histogram for sample size 2000 in Fig 1.1. The normal distribution is the most important of the theoretical distributions you will come across because it applies so often in practice, but it is not the only one. Two others, the binomial and the Poisson are going to be introduced below before a more detailed look at the normal distribution. 8 11 10 9 n = 20 n = 50 8 7 6 5 5 4 4 3 3 2 2 1 1 21 25 29 33 37 17 n = 300 21 25 29 33 37 41 n = 2000 200 150 50 100 30 50 10 17 21 25 29 33 37 41 17 21 25 29 33 Figure 1.1: Histogram for sample size 20 to 2000 9 37 41 0.4 n=6 p = 0.33 0.3 0.2 n = 10 p = 0.5 p 0.1 0.2 0.1 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 9 10 Number of brown eggs Number of heads in 10 throws Figure 1.2: Binomial Distribution 1.3.1 Binomial distribution Frequency distributions fall into two categories, continuous and discontinuous. The normal distribution is an example of the former. In a continuous distribution your measurements can theoretically take any value within a particular range. In other situations your measurements can only take one of a limited number of values, giving rise rise to a discontinuous distribution. Suppose you work at the egg marketing board and you are looking at the number of brown eggs in a box of six. The number of brown eggs can take any integer value from 0 to 6. If you sample many boxes and record the number of brown eggs per box on a bar chart with seven categories (Fig 1.2) you will build up a discontinuous frequency distribution. This type of discontinuous distribution is modelled mathematically by the binomial distribution. The binomial distribution applies whenever you have a number of trials (say n) each of which has a fixed chance of being in one of two categories (hence binomial). The distribution for a particular number of trials (n) gives the chance of getting each of the n + 1 outcomes. That is the chance of 0 successes in n trials, 1 success in n trials up to n successes in n trials. If this seems confusing, consider the egg example. The six eggs in a box represent six trials. Each trial can be a white egg or a brown egg. If the average number of brown eggs per packet of six is two, then the probability of getting a brown egg in a single trial would be 0.33 (2/6). The binomial gives the chance of getting 0 brown eggs in 6 trials, 1 brown egg in 6 trials and so on up to a box full of brown eggs. A simpler example would be the number of heads you would get if you tossed a coin 10 times (Fig 1.2). There are 10 trials and two possible outcomes for each trial, heads or tails. Again the binomial gives the chance of getting 0 heads in 10 throws (not very likely) up to all heads (also unlikely). The most common category, as you would expect, is 5. Note that the vertical axis in Fig 1.2 gives the probability for each category and 10 hence the height of all the bars must sum to 1. 1.3.2 Poisson Distribution The Poisson distribution is named after the man who first described it. It crops up in situations where you are counting the number of random events which occur in a given time period, for example the number of radioactive decays per second. The classic example in the text books is the number of soldiers kicked to death by horses for each year of the Prussian war. The Poisson also occurs where you are counting items in a unit volume. Thus the number of bacteria in a millilitre of medium will follow a Poisson distribution if the culture is well mixed. 1.4 The Normal Distribution The Binomial distribution and the Poisson distribution have been mentioned because you are likely to come across them from time to time and it is just as well to realise that not all distributions are bell shaped, symmetrical and continuous. However the Normal distribution is so central to statistics that it must be described in more detail. Let’s go back to the frequency histogram in Fig 1.1. The vertical axis gives the numbers of rats in each of the weight classes. If the number of rats in each class is added up this will give the total number of rats measured. It would be possible to divide the number of rats in each class by the total number measured so that the total summed to one. The vertical axis would now give the probability of a rat, selected at random from those measured, being in that weight class. Thus such a histogram would be equivalent to the discontinuous distributions of Fig 1.2. When the smooth normal curve is scaled in this way, so that the area under the curve is unity, it is known as a probability curve. 1.4.1 Mean and Variance The mathematical formula which describes the normal curve (don’t worry, you will not even see this) has two parameters, one which gives the position of the peak, the mean; and one which describes how spread out the curve is, known as the variance. If you know these two parameters you can fully describe the normal curve. Given a large series of measurements, how do we specify the normal curve which best approximates the frequency distribution of our measurements? We must calculate the two parameters of the normal distribution from our data. First the mean. This is simply the average of our measurements. Mathematically this is written: P x µ= n 11 P If you have not seen the summation symbol ( ) before it means add together all instances of whatever occurs inside it (here x). So the mean (µ) is the sum of P all the measurements ( x) divided by the number of measurements (n). The second parameter, the variance, is less obvious. A large value should indicate a spread out distribution. The measure should not depend on the sample size. A sample of 10 from a particular population should give about the same value as a sample of 100, since the underlying curve we are trying to specify is the same in both cases. It would be a good idea if you first consider how you would devise a measure of the spread of a data set. One way to measure the spread would be to take each individual and see how far it departs from the mean (ie the centre of the distribution). If we add all these up they would sum to zero, because all the negative differences would cancel out the positive differences. We could just knock off the negative signs and then add them up. This would give a large value for spread out distributions and a small value for narrow distributions, but it would not be independent of sample size. In fact a sample of 100 would have a value roughly 10 times that of a sample of 10. The solution is to divide by the sample size to get the average absolute deviation from the mean. You could provide the following formula to describe it. P | µ−x | σ = n The vertical bars | mean “the absolute value of”. This is a reasonable measure of spread which has been used in the past. The measure which is actually used is called the standard deviation. This is calculated in a very similar way, except that the deviations from the mean are squared before they are added. The sum of the squared deviations is divided by n to give the mean squared deviation or mean square, also known as the variance. The variance, which is given the symbol σ 2 , is square rooted to give the standard deviation, which is given the symbol σ . Hence the standard deviation is on the same scale as the original measurements. s P (µ − x)2 σ= n Notice that once you square the deviations they all become positive and there is no need to take the absolute value. This is not the only reason that this measure of spread was adopted, there are “good theoretical reasons” beyond the scope of this course. The conventional way of describing a particular normal distribution in mathematical notation uses the form N (µ, σ 2 ). Thus a normal distribution with a mean of 10 and a variance of 5 would be written as N (10, 5). There is an important difference between a probability graph for a continuous distribution and the equivalent probability graph for a discontinuous distribution. 12 0.04 0.03 0.02 0.01 0.00 -3 -2 -1 0 1 2 3 Figure 1.3: Area of the normal curve larger than µ For example, in the binomial shown in Fig 1.2 the probability of a particular outcome (say 5 heads in 10 throws) can be read off the graph, as the height of the bar for 5 heads (p = 0.2461). You can not treat the normal probability curve this way however. Suppose x is a measurement on an individual chosen at random from a normal population. A continuous variable can take any value between certain limits and not just particular integer values. So if x is measured to sufficient decimal places it will never be exactly the same as a particular chosen value of x. Put in another way, the probability of getting a particular value exactly will be zero. However, it is possible to use probability graphs to determine the chance of x lying between certain limits, in which case the area bounded by the two limits gives the appropriate probability. For example, what is the probability that x is greater than the mean? The region of the normal probability curve which corresponds to values larger than the mean is shown shaded in Fig 1.3. The area of this shaded region gives the probability that we are after. Bearing in mind that the area under the whole curve is 1 and that the curve is symmetrical it should be clear that the area (and hence the probability) of the shaded region is 0.5. 13 1.4.2 Standard Normal Curve Normal distributions differ in their mean and variance. We can reduce any normal distribution to a standard form by first centering it about zero. This amounts to moving the whole normal curve along the horizontal axis until it stradles zero. We can achieve this by subtracting the mean from all measurements. Next, if we scale our measurements by dividing each by the standard deviation we will end up with a variance of one. The original distribution has now been transformed to N (0, 1) which is known as the standard normal curve. The normal curve has some properties which you will come across again and again. • The area of the curve which lies between one standard deviation on either side of the mean is 68% of the total area of the curve , that is 68% of the individuals in the population which the curve represents lie within 1 standard deviation of the mean. • 95% of the area lies within 1.96 standard deviations of the mean. • Nearly all of the normal probability curve lies within 3 standard deviations on either side of the mean. This is the kind of information we need in order to access the likelihood of our results under a particular null hypothesis. Suppose you take a single measurement on an individual who has undergone some kind of treatment. You have a very large number of measurements on people who have not had the treatment, so you know the mean and variance of the distribution accurately. You would like to know whether there is any evidence that the treated person differs from the untreated patients. Our null hypothesis is that the treated patient does not differ from the untreated and is in effect a random individual drawn from our normal distribution. We should decide beforehand what level of probability we will accept as evidence that there is a difference. A common level used in science is a probability of 0.05. (1 in 20). To put this in another way; if a result as different from the mean as ours would only occur by chance one time in twenty, or less, under the null hypothesis, then we will reject the null hypothesis and conclude that we have a significant effect. Suppose our mean is 112 and standard deviation 4. In Fig 1.4 we have shaded in 2.5% of the total area in each tail. If we select at random from the population 1 in 20 will fall within the shaded area, so if our observed value lies in this area we will reject the null hypothesis at the 5% level. The shaded area represents our rejection region or critical region. We have said that 95% of the normal curve is enclosed by 1.96 standard deviations. This is an important figure because 5% is the commonly adopted level of significance. So in this example any measurement which deviates from 112 (the mean) by more than 1.96 standard deviations , i.e. 7.84 (1.96×4 = 7.84) will lie in the rejection region. Let’s suppose we have an individual with a value of 102. This 14 0.04 0.03 0.02 0.01 0.00 100 104 108 112 116 120 124 Figure 1.4: 2.5% of the area of the normal curve in each tail deviates from the mean by 112 − 102 = 10. More than 7.84, so this individual lies in our rejection region and we would certainly conclude that this individual does not belong to our population. To find the exact probability of a value of 102 we can standardise it so that it is expressed as a standard normal deviate and then refer to tables of normal deviates. To standardise 102 we express it as a deviation from the mean and then divide by the standard deviation, as follows. (x − µ) σ (102 − 112) z = 4 z = −2.5 z = It is conventional to refer to standard normal deviates by the letter z. We could now refer to tables of normal deviates (z tables) but it is simpler to use M INITAB. (see Exercise 1 at the end of the next Chapter) 15 16 Chapter 2 Sampling 2.1 Sample Mean and Variance In chapter 1 we dealt with the normal distribution and its two parameters the mean and variance (µ and σ 2 ). We have given formulae with which µ and σ 2 can be calculated from the population of measurements. In practice we are almost always faced with a different situation, in which we are estimating population parameters from a sample. Suppose we want to find out the systolic blood pressure of students at Bath University. We would not be able to determine the blood pressure of every single student, so we would choose a “representative sample”, measure them and estimate the mean student blood pressure from these. Population The population consists of all items which are of interest. Sample The sample is the subset of the population for which we have measurements. Leaving aside the problem of how to choose a representative sample, how can we best estimate the population mean and variance from our sample? Clearly the sample mean is the best estimate we have of the population mean. It should be obvious that the larger the sample, the more accurate this estimate will be. P x x̄ = n This is the same as the formula for the population mean except that the symbol x̄ is used to distinguish it from the population mean µ. Although the sample mean is subject to some error it is said to be an unbiased estimate of µ. This means that if many samples were taken their means would vary about the true mean. What about the variance? If we use the same formula that we used for the population variance we do not get an unbiased estimate of σ 2 . If we could calculate all our deviations from the mean by subtracting µ, the true population mean, our 17 estimate would be unbiased, but we do not know µ, we only have an estimate of it in x̄. Since x̄ is calculated from our sample the deviations from this mean are bound to be less than they would be from µ. It follows that our estimate of the variance will be too low, especially for small samples. It can be proved mathematically that if we divide the sum of squares by n − 1 instead of n this gives us an unbiased estimate of the variance. hence the formula for calculating the sample variance is : P (x − x̄)2 2 s = n−1 Again notice that the Roman letter s is used for the sample standard deviation rather than σ . It is a convention in statistics to use Greek letters for population parameters (e.g. µ and σ ) and Roman letters for estimates of parameters (e.g. x̄ and s). Estimates of parameters are known as statistics. 2.1.1 Degrees of Freedom The term n − 1, in the formula above, is called the degrees of freedom. Remember that when calculating a statistic you loose one degree of freedom for each term which is calculated from the sample. Thus the calculation of s involves subtracting each item from the term x̄ which has itself been calculated from the sample, hence one degree of freedom is subtracted from the total n. You might find an alternative explanation of Degrees of freedom easier to grasp. Consider a sample of 8 students which have been measured for systolic blood pressure. You can see that the 8 deviations from the mean must, by definition sum to zero. Hence as soon as we know 7 of the deviations the last one is determined. In other words only 7 deviations can vary independently, hence 7 degrees of freedom. Exercise: Calculation of Mean and Standard Deviation Eight students were selected at random and measured for systolic blood pressure. The values (in mmHg) were 130, 141, 120, 110, 118, 124, 146, 128. • Estimate the mean systolic blood pressure of the student population. • Estimate the standard deviation of systolic blood pressures. • Estimate the proportion of the student population with systolic blood pressures over 135. 18 2.2 Interval Estimation The estimate of mean student systolic blood pressure in the example above is a single value sometimes called a point estimate. Stated more formally – The sample mean (x̄) is an unbiased point estimate of the population mean (µ). The sample standard deviation (s) is an unbiased point estimate of the population standard deviation (σ ) provided the sums of squares is divided by the degrees of freedom n − 1. 2.2.1 The Sampling Distribution of x̄ In published work you will usually see quoted means accompanied by a measure of reliability, x̄ ± a. These are commonly so called 95% confidence limits which means “we are 95% certain that the range x̄ −a to x̄+a encloses the true population mean”. Such a statement is an interval estimate of the mean. You may also see means quoted as a confidence interval. For example a mean and confidence limits of 4.5 ± 0.4 could be expressed as 4.1 – 4.9. This is just an alternative to confidence limits. How are confidence limits calculated? Imagine that the above experiment (take 8 students at random and calculate the mean) was repeated many times. We would end up with a collection of means which would themselves be normally distributed but with a much smaller variance than that of the individuals in the population. The distribution of the means is called the sampling distribution of the mean. We would correctly guess that the larger the samples that our means were based on, the less would be the spread of our means, and hence the more confident we could be of the accuracy of our estimate of the true mean. Have a look at Fig 2.1. The top figure represents the population that we are sampling from. The figure below shows the sampling distribution of samples of size 10 from this population. This has a much narrower spread than the parent population above. If we took lots of samples of size 10, this is how the means would be distributed. The bottom figure shows how the sampling distribution of samples of size 20 would look. The quantitative relationship between sample size, the population variance and the variance of the sampling distribution of the mean is investigated in the next exercise. Confidence limits for a statistic are derived from the sampling distribution of the statistic, so once we know how to estimate the variance of the sampling distribution of our mean we will go on to show how confidence limits are calculated. The variance of the mean is a function of the population variance and the sample size, so for a single sample we can predict how much error to expect on our estimate. In the next exercise the relation between the variance of the sample mean, population variance and sample size will be investigated by using M INITAB to simulate the taking of many samples of a particular size from a population and looking to see how our sample means vary. 19 Figure 2.1: Sampling Distribution of Means Population mean = 120 standard deviation = 10 100 120 140 n = 10 100 120 140 n = 20 100 Sampling distribution of the means of samples of size 10 from the population above. 120 140 20 Sampling distribution of the means of samples of size 20 from the population above. Exercise: Sampling Distributions • Enter M INITAB • Try the following command : RANDom 10 C1; NORM 120 10. M INITAB will generate a random sample of size 10 from the distribution specified in the sub-command, in this case a normal distribution with a mean of 120 and a standard deviation of 10; that is N(120, 100). HELP RAND will give more details of what this command does. • Generate 10 samples of size 40 from N(120, 100) and store them in columns C1 to C10. RAND 40 C1-C10; NORM 120 10. We have filled 40 rows and 10 columns with values sampled at random from N(120, 100). We are going to investigate how the mean of samples of size 10 varies compared with the variance of the population from which the samples have been taken (σ 2 = 100). Each of the 40 rows represents a sample of size 10. • The following command is a quick way of calculating the necessary statistics. RMEAN C1-C10 into C21 This collects the means of the 40 samples and puts them into C21. • Look at C21. and compare the spread of values compared with say C1 and C2. DOTP C1 C2 C21; SAME. continued on next page 21 continued from previous page • Calculate the standard deviation and variance of C21. LET K1 = STDEV(C21) LET K2 = K1**2 PRINT K1 K2 Mean of the 40 means Standard deviation of the 40 means Variance of the 40 means The mean of the means should be close to 120 (it is based on a sample of 40 x 10 = 400!). • We are interested in the sampling distribution of the 40 means. Compare the variance of the sample means with the population variance. • Repeat this for samples of size 20 and 2. Fill in the table below sample size overall mean variance of means 2 5 10 15 20 • What would the variance of the means be for samples of size 1? • Try to deduce the relationship between variance of the sample means, population variance and the sample size. You might find it helpful to plot variance of the means against 1/n and try some intermediate sample sizes such as 5 or 15. Remember that your estimated variances will be subject to sampling error, this is especially true for small sample sizes, so if you are going to repeat a sample size concentrate on the small sample sizes. • If you can suggest a formula for calculating the standard deviation of the mean (call this sem for now) in terms of the sample size n and the sample variance (s 2 ) then so much the better. 2.3 Confidence Limits If random samples of size n were to be taken from a distribution with mean µ and standard deviation σ ; then the sample means wouldpform a distribution having the same mean µ but with a smaller standard deviation s 2 /n; (where s is an estimate of σ ). The standard deviation of the sampling distribution of x̄ and is often called the standard error of x̄, (sem), to distinguish it from the standard deviation (s) of the 22 √ sample. It can also be written as s/ n. If our sampling distribution is normal and the sample size is large (over about 30), we know that 95% of the sampling distribution of x̄ lies within 1.96 standard √ deviations of the mean. Since the standard deviation of x̄ is s/ n the 95% confi√ √ dence interval for x̄ is calculated as x̄ + 1.96 × s/ n to x̄ − 1.96 × s/ n. To put this another way; given the sample mean x̄ we are 95% confident that the interval calculated above will enclose the “true” population mean µ. The two end points of the confidence interval are the confidence limits which are often written √ as x̄ ± 1.96 × s/ n. For small samples our estimate of µ varies from sample to sample so we need √ to take a larger interval than 1.96 × s/ n. The appropriate value can be obtained from the Student’s t-distribution. As n gets larger, t approaches 1.96, see below, but increases quite rapidly for small samples. For example, for a sample of size 3 (degrees of freedom 2) a t of 4.3 would be used. df (n − 1) t95% 1 12.71 2 4.30 3 3.18 4 2.78 7 2.36 10 2.23 20 2.09 ∞ 1.96 The formula for calculating confidence limits is: s s2 x̄ ± t n Where you will need to look up the appropriate value of t for the level of confidence you want (usually 95%) and the sample size. For sample sizes over 20 a value for t of 2 is a good enough approximation. 23 Exercise: Calculation of Confidence Limits Measurements of systolic blood pressure were made in 8 students, the values (in mmHg) were: 130, 141, 120, 110, 118, 124, 146, 128. You have already found the mean and standard deviation of these data. Now calculate the following. sem = 95% Confidence limits of population mean = 95% Confidence interval = Exercise: Confidence Limits for Titration Results. • Go into M INITAB and open the worksheet with your titration results RETR ’GROUPA’ • You calculated the mean and variance for the class data last time. Calculate the 95% confidence limits for this mean using the normal deviate 1.96. Do this for the “before instruction” and the “after instruction” set of results. Do not use the TINT command. 95% confidence limits for class mean before instruction = 95% confidence limits for class mean after instruction = • What is the appropriate normal deviate for calculating the 99% confidence interval? The M INITAB command INVCDF will help (Inverse Cumulative Distribution Function). • Does the “correct value” as given by the chemists lie within the class confidence intervals? If not, why not? • Check your results using the M INITAB TINT command. 24 Confidence Intervals with M INITAB You can find confidence intervals easily with M INITAB. If your data are in C1 type: TINTerval C1 2.4 Central Limit Theorem In calculating confidence limits we assume that the sampling distribution of the statistic is normal. The central limit theorem means that this is usually a safe assumption to make. It may still be necessary to check on normality with small samples. The Central Limit Theorem For reasonably large samples x̄ is approximately normally distributed whatever the parent distribution. 2.5 Coefficient of Variation As we have seen the standard deviation is measured in the same units as the individual measurements. Quite frequently in experimental procedures large measurements tend to have large errors and low measurements low errors. If in fact the errors are proportional to the mean it would be valid to calculate a quantity known as the coefficient of variation (cov), which is simply the standard deviation expressed as a proportion of the mean. This is often expressed as a percentage. The coefficient of variation provides a measure of variability which can be used to compare samples with very different means. The coefficient of variation is also known as coefficient of error. This is calculated by: cov = s × 100% x̄ For example a sample with mean 109 and standard deviation 9.8 has a coefficient of variation of 8.9%. A sample mean 11.4 and standard deviation 3 has a coefficient of variation of 26.3%, thus the second sample has a larger error associated with it than the first. Remember that we have assumed that the error increases in proportion to the mean. This is not always the case. One benefit of using the coefficient of variation is that it is independent of the units of measurement. Whether you measure a sample of heights in centimetres or yards the coefficient of variation will be the same. One warning; you can not use the coefficient of variation if measurements can take negative values. Thus you would not use it where temperatures are in Fahrenheit or centigrade. 25 Exercise: Coefficient of Variation In an assay for serum ACTH, a reference sample was assayed 12 times in laboratory A and 12 times in laboratory B. Laboratory A returned a mean of 17.5 pg/ml with a standard deviation of 2.5 pg/ml, whereas laboratory B found a mean of 19 pg/ml with an s of 5 pg/ml. • Thus the coefficient of variation for laboratory A would be ? • for laboratory B ? • Whose results are more reliable ? • What assumption have you made ? 26 Chapter 3 Comparing Two Samples 3.1 Hypotheses Whenever you are assessing whether an individual value has come from a population of known mean and standard deviation or whether a treatment with a new drug is having a significantly better effect than an old one, you are testing a hypothesis. By their nature hypotheses cannot be proven but they can be rejected or accepted on the basis of available evidence. For example, we might observe an improvement in recovery times in a sample of patients treated with a new drug compared to a sample treated with the usual drug. However, it is always possible that there is, in fact, no difference between the two drugs and that our apparent improvement is simply due to chance. The larger the difference the less likely it is to be a chance event, but we can never prove, in the strict sense, that it is not due to chance. What we can do is calculate the probability of obtaining the observed improvement by chance, since we can find the mean and variance of the two samples. The general procedure in a statistical test is to formulate what is known as a null hypothesis. In the case of an experiment to test whether a new drug is better than the usual drug, the null hypothesis would be stated in the following form. The new drug is no better than the usual drug An alternative hypothesis in this case might be The new drug is better than the usual drug. The important property of a null hypothesis is that we can calculate the chance of the observed measurement assuming that the null hypothesis is true. We can use this information, which is expressed as a probability, to decide whether we have evidence that would lead us to reject the null hypothesis in favour of an alternative hypothesis. There is an arbitrary element to this decision but it is conventional to reject the null hypothesis if the probability of the observed result is less than 0.05, or in other words, if the observed result would occur less than once every twenty experiments in the long run. 27 The distribution of the difference between two means under the null hypothesis 2.5% 2.5% 5% 0 0 Critical region corresponding to the one-tailed alternative hypothesis. Critical region corresponding to the two-tailed alternative hypothesis A result which is considered too unlikely to have occurred under the null hypothesis is said to be statistically significant. The statistical test which is used for comparing two means is called Student’s t-test. The appropriate test to use when there are more than two means to be compared is Analysis of Variance which is dealt with in the next chapter. The abbreviation H0 is used for the null hypothesis and H1 for the alternative hypothesis. 3.2 One and Two-tailed tests Look again at the alternative hypothesis in the example above. The new drug is better than the usual drug. The implication here is that we would only consider a result to be significant if the new drug had a higher mean than the usual drug. Under the null hypothesis (no difference in mean) the observed difference in means for two samples of any given size will form a normal distribution with a mean of zero. The variance of this distribution will depend on the sample sizes. The critical region of this distribution, for a 5% significance level, will be in one tail and form 5% of the total area. Any observed difference in means which fall in this region will be considered evidence of a significant difference between the two samples. When the alternative hypothesis is in this form we are carrying out a one-tailed test. In scientific research it is far more common to carry out two-tailed tests, where the alternative hypothesis is in the form: The new drug differs in effect from the usual drug. In this case both tails of the distribution form the critical region and both have an area of 2.5% of the total (summing to 5%). 28 3.3 Unpaired t-Test Let us first consider the t-test which is used when data are unpaired. The effect of two anti-hypertensive drugs are studied on the blood pressure of two different groups of subjects. One group is given the first drug and the other group the second. After measuring the blood pressure of each subject, we wish to compare the mean blood pressure of one group with the mean of the other group. If there is a difference in mean we need to know whether this constitutes evidence that the two drugs differ in effect, or whether the observed difference can be attributed to chance. This is a very common experimental design and Student’s t-test is the appropriate test providing that certain assumptions, which are discussed below, are valid. A similar common experimental design, which requires a different test, involves each subjects being matched in some way and each member of the pair being given one of the drugs. Indeed the same subject may be given both drugs at different times. This requires a t-test for paired data Section 3.4. 3.3.1 Theory Suppose x¯1 and x̄2 are the means of our two samples and s1 , s2 are the standard √ √ deviations. The standard error of the two samples would be x̄1 / n1 and x̄2 / n2 . Our null hypothesis is µ1 = µ2 . In other words µ1 − µ2 = 0. What we would like to know then is the sampling distribution of x̄1 − x̄2 so that we can calculate the probability of any observed difference in means under the null hypotheses. We know the standard error of both x̄1 and x̄2 so what is the standard error of one subtracted from the other? In general if two normal distributions are added the variance of the new distribution is the sum of the two variances. Perhaps surprisingly, if we subtract two distributions the variance of the new variable is also the sum of the two variances. The following short exercise should convince you. 29 Enter M INITAB and generate two random samples from the N (50, 100) distribution. RAND 50 C1 C2; NORM 50 10. Now add the two columns into C3 and subtract them into C4. LET C3 = C1 + C2 LET C4 = C1 - C2 Now check the variances of the 4 columns. LET K1=STDEV(C1)**2 LET K2=STDEV(C2)**2 LET K3=STDEV(C3)**2 LET K4=STDEV(C4)**2 PRINT K1-K4 The variances of C3 and C4 should both be approximately twice those of C1 and C2. So we can find the standard error of the difference in means by adding the variances of the two means and taking the square root. For large samples, if the difference between the means is more than 1.96 standard errors the difference is significant at the 95% level. For smaller samples we use the appropriate value of t instead. In the formula below a pooled sample variance is formed by combining the sample variance of the two groups into a single estimate. The pooled variance, sp2 , is given by : (n1 − 1)s12 + (n2 − 1)s22 n1 + n2 − 2 Each sample variance is multiplied by the degrees of freedom to give the sums of squares. The sums of squares are added and divided by the total degrees of freedom to give a pooled sample variance. sp2 = t is then calculated by: t= x̄1 − x̄2 q sp × n11 + 1 n2 The procedure above, in which a pooled estimate of the variance was calculated, assumes that the variance of the two samples is the same. It is important to check this assumption before carrying out a t-test. A formal method is the variance ratio test also known as Fisher’s F -test. Simply calculate s12 /s22 and look up the value of this ratio in the appropriate table. Alternatively use a statistics package. 30 3.3.2 Example of Student’s t-test In order to test the effectiveness of a new analgesic drug, two groups of mice were used. The first group received saline as a control, whereas the second group received a dose of the drug. To test for analgesia the animals were placed in turn on a hot-plate maintained at 56◦ C. The time taken for the mouse to rear on its hind legs and lick its forepaws was taken as an indicator that the animal was aware of the pain stimulus, and was recorded. As soon as the response was observed the mouse was removed from the hot-plate. We have designed the experiment to answer the question... does the analgesic significantly affect the animals response to pain? In this case we are prepared to accept the possibility that the drug could actually make the mice more sensitive to pain, so we are going to use a two-tailed test. H0 the analgesic does not effect the animals H1 the drug does affect the animals response to pain The data are shown below. Analgesia (Seconds) Treatment Saline control (x1 ) 18 14 16 11 21 24 19 20 Test Drug (x2 ) 22 18 31 38 26 28 29 40 24 15 In M INITAB type the data into C1 and C2. Have a look at the data using the describe and dotplot commands. DESC c1 c2 DOTP c1 c2; SAME. Or use a boxplot as shown below. response 40 30 20 10 drug control The spread (variance) of the two samples is fairly similar. The control group has a mean time of 18.20 whilst the treatment group has a mean time of 29.00. 31 Is this increase in response of 10.8 seconds significant? Looking at the boxplot, this difference certainly looks real but to answer this objectively we can carry out a t-test. 3.3.3 Checking equality of variance If you are worried that the variances might be different a formal check can be made. The ratio of the two variances is calculated by dividing the larger variance by the smaller. Here C2 has the larger variance. If this is not the case reverse C1 and C2. This gives an F -ratio, as the ratio of two variances is known, of 3.03 with 7 and 9 degrees of freedom. To find the significance of this value you can use Minitab’s CDF command (Cumulative Distribution Function) which gives the area of the distribution up to the specified value (see the diagram). 93.76% 6.24% 0 F(7,9) 3.03 Figure 3.1: The area of the F -distribution given by the CDF command. We are interested in the probability of values of F greater then 3.03 so we must subtract the value given be CDF from 1. Furthermore this would need to be doubled for a two-tailed tests. The complete set of Minitab commands to carry out this F test is given below. LET K1 = STDEV(C2)**2 / STDEV(C1)**2 CDF K1 K2; F 7 9. LET K3 = (1 - K2)*2 PRINT K1 K3 CDF K1 K2 gives the probability for a value in K1 and stores it in K2. The subcommand F 7 9 specifies an F distribution with 7 and 9 degrees of freedom. This gives an F -ratio of 3.03 with 7 and 9 degrees of freedom the two-tailed probability of which is 0.124. Nothing to worry about. A value of K1 significantly different from 1 (K2 < 0.05) indicates that the two samples differ in variance, in which case the POOLED subcommand should not be 32 used, see below. Another solution is to use a non-parametric test (section 3.5) in preference to a t-test. If the F -ratio is sufficiently close to 1 we can proceed to calculate the value of t. 3.3.4 Calculating Student’s t Assuming the variances are similar the following M INITAB command will carry out Student’s t test. If the variances differ omit the POOLED subcommand or use a non-parametric test such as the Mann-Whitney U test. TWOSample C1 C2; POOLED. The output from these commands is shown below. Twosample T for control vs drug N Mean StDev control 10 18.20 4.26 drug 8 29.00 7.43 SE Mean 1.3 2.6 95% C.I. for mu control - mu drug: ( -16.7, -4.9) T-Test mu control = mu drug (vs not =): T= -3.88 P=0.0013 DF=16 Both use Pooled StDev = 5.86 The output gives the 95% Confidence Interval for the observed difference between the means as -16.7 to -4.9. This range does not include zero so we can conclude that the difference is real. The probability of observing this difference between the means by chance if H0 is true is only 0.0013; so the difference is highly significant. 3.4 Paired t-Test In many experimental situations control and experimental measurements are carried out on the same subject, e.g. when studying the action of a drug on blood pressure, we may first measure the patient’s normal blood pressure, administer the drug and then measure the pressure again to see whether the drug has had an effect. In such situations, where both control and test measurements are carried out on the same or similar subjects, the data are said to be paired. In such cases it is the difference between the measurements that we are interested in. For each subject we have two values. x1i and x2i and the difference di = x1i − x2i 33 Before Treatment After Treatment Difference x11 x12 . . . x1n x21 x22 . . . x2n d1 d2 . . . dn The null hypothesis tested is that the mean difference is zero. The mean of the set (di ) is found along with the standard deviation of (di ). thus P ( di ) Mean d̄ = s n P (di − d̄)2 Standard Deviation sd = n−1 The t value (for n − 1 degrees of freedom) is given by t= d̄ Sd √ n A paired t-test carried out with M INITAB A paired t-test is carried out with M INITAB by first forming a column of differences between the two sets of data. Then the null hypothesis that the mean of this column of differences is zero can be tested as follows. LET C3 = C1 - C2 TTEST 0 C3 34 Paired t-Test The table shown gives the heart rates of 8 subjects (beats/min) before and after treatment with a test drug. Subject Before After Difference 1 2 3 4 5 6 7 8 75 81 68 70 85 76 70 73 73 78 69 64 75 71 63 72 2 3 -1 6 10 5 7 1 Calculate d̄ = sd = Then t = Is this significant ? If so, on what level? 3.5 Non-parametric tests The variables we have dealt with so far have been measured on a so called interval scale, that is the magnitude of the difference between any two measurements is important. Sometimes we can only meaningfully place a series of items in a rank order but the size of the difference between successive items has no meaning. This is an ordinal scale. Where precise numerical measurements can be made on the observations, and where the samples are large enough to ensure that the central limit theorem is applicable (and hence that the distribution of the difference between the means is normal), t-tests are used to test the null hypothesis (H0 ). Where measurements are not very precise, but can at least be arranged meaningfully in a rank order, or where there is a danger that the central limit theorem is violated, perhaps because the sample sizes are very low, non-parametric tests are used. Most non-parametric 35 tests only make use of the rank order of the data. It follows that whereas parametric tests are based on arithmetic means, non-parametric tests concentrate on median values. Since non-parametric tests make no assumptions about the form of the underlying distributions they are often referred to as distribution free methods. Since non-parametric methods use less of the information in the data it is only to be expected that they are less powerful than the corresponding parametric technique. That is, they are less likely to detect a real effect should it exist. 3.6 The Mann-Whitney U test This is the non-parametric counterpart of the unpaired t-test and can be used to test whether two independent groups have been drawn from the same population. The statistic calculated is U . There is an equivalent test called the Two-Sample Wilcoxon Test for Independent Data which uses the sum of ranks (T ) as the test criterion. However the Mann-Whitney U test is probably more widely used and is described here. Although M INITAB will carry out these test for you, non-parametric tests are often quite simple to carry out by hand and doing so will help to make the procedure clear. The ‘long hand’ procedure is given below. Procedure 1. Rank data taking both groups together, giving rank 1 to the lowest score etc. Algebraic size is considered, lowest ranks assigned to the largest negative number (if any present). Tied ranks are given the average of the tied ranks. 2. Find the sum of the ranks (T ) for both samples. 3. Calculate U for each sample. For sample 1 with number of scores N1 and sum of ranks T1 . N1 × (N1 + 1) U = (N1 × N2 ) + (3.1) −T1 2 4. Find the critical value of U with N1 and N2 from Table 7.2 and compare this with the smaller of the two calculated U values. If the observed U is less than, or equal to, the tabulated value the result is significant. Where N1 or N2 is more than 20 see section 3.6.3. 36 3.6.1 Example 1 Five rats were trained to imitate leader rats in a maze. They were trained to follow the leader when hungry. Then the 5 rats were transferred to a shock-avoidance situation, where imitation of leader rats would enable them to avoid electric shock. The experiment was designed to see whether rats could generalise learned behaviour when placed under a new drive and in a new situation. The number of trials it took for each rat to reach a criterion of 10 correct responses in 10 trials was recorded on the 5 trained rats and on 4 untrained controls. The hypothesis was that the 5 rats who had already been trained to imitate a leader (E rats) would transfer this training to the new situation and reach the criterion in fewer trials than the control (C rats). The data are tabulated below. Trials E rats 75 63 70 45 C rats 110 85 64 77 81 The data (number of trials to reach criterion) are probably only on an ordinal scale of measurement and the sample sizes are small so that it is possible that the difference in means would not follow a normal distribution. A non-parametric test was used. Null Hypothesis H0 the number of trials to the criterion in the shock avoidance situation is the same for rats previously trained to follow a leader through a maze for food. Alternative hypothesis H1 previously trained rats will reach the criterion in fewer trials. Hence a one-tailed test was used. We arrange these scores in order of size, retaining the identity of each. rank 1 2 3 4 5 6 7 8 9 score 45 63 64 70 75 77 81 85 110 group E E C E E C E C C Find the sum of ranks (T ) for each group. Tc = 26 and Te = 19. Find U from equation 3.1. Uc = 4 and Ue = 16. Take the smaller of these two (4) and look in Table 2c. The tabulated value of U is 2 for N1 = 4 and N2 = 5. Our observed value of 4 is not less than or equal to this, so we accept the null hypothesis. There is no evidence from the data that rats can generalise learned behaviour when placed under a new drive and in a new situation. 3.6.2 Procedure for small samples For small samples the following “short-cut” method can be used: 1. Rank the data as above 37 2. Consider the Control Group and count the numbers of E scores that precede (ie. fall to the left of) each score in the control group. Thus, U = 2 + 4 + 5 + 5 = 16 3. Repeat this for the number of C scores that precede each E score (U = 4). Look up the smaller of the two U values. 3.6.3 Large samples (N1 or N2 > 20) As N1 and N2 increase in size, the sampling distribution of U rapidly approaches the normal distribution, with mean = µU = N1 N2 2 and standard deviation = σU = N1 N2 (N1 + N2 + 1) 12 Thus when N2 > 20, the significance of an observed value of U is determined by: z = U − µU σU = U− N1 .N2 2 N1 .N2 (N1 +N2 +1) 12 which is practically normally distributed with zero mean and unit variance, i.e., the probability associated with the occurrence under H0 values as extreme as an observed z may be determined by reference to tables of z. If a two-tailed test is being used, then the observed z is significant at p = 0.05 if it is > 1.96. (One-tailed test, p = 0.05, z >1.64) Mann-Whitney U test with M INITAB Type the data into two columns, say C1 and C2. Then give the command MANNU C1 C2 The output from this command includes the Mann-Whitney U statistic and the exact probability of the observed ranking under a one-tailed null hypothesis. Double the probability for a two-tailed test. 38 3.7 Wilcoxon signed-rank test This is the non-parametric counterpart of the paired t-test for equality of means. It is used to compare medians of two samples when each observation in the first sample is paired with an observation in the second sample. Procedure 1. Obtain the difference between each pair of readings taking sign into account. Eliminate cases with no difference and reduce N accordingly. 2. Rank these differences, ignoring the sign, giving rank 1 to the smallest absolute difference. 3. Calculate T , the sum of ranks for the less frequent sign. 4. Consult Table 1. If the observed T is equal to, or less than the tabulated value, then there is a significant difference between the two conditions and H0 should be rejected. 3.7.1 Example 2 Eight pairs of twins were exposed to complex reaction time tests: one of each pair of twins was tested after 3 double whiskies, the other while completely sober. H0 : drink does not affect reaction time. Should we use a one-tailed test or a two-tailed test? In this example a case could be made for either. We might expect that drink would adversely affect reaction time and that our alternative hypothesis is therefore that the median of the Sober group is less than the median of the Whiskey group. In other words we could carry out a one-tailed test. If you are uncertain whether a one-tailed or two-tailed test is appropriate ask yourself the following question. If the reaction times of the Sober group turn out to be greater than that of the Whiskey group will you reject this as chance or will you consider this a potentially interesting result and carry out a test? If the latter, and this is probably the most common situation, you should carry out a two-tailed test. In this example we will adopt H1 median Sober differs from medium Whiskey and carry out a two-tailed test. 39 Reaction time Sober Group Whiskey Group d Rank of d Rank with less frequent sign 310 340 290 270 370 330 320 320 300 320 360 320 540 360 680 1120 -10 -20 70 50 170 30 360 800 1 2 5 4 6 3 7 8 1 2 T =3 Table 1 shows that for N = 8 the critical value of T for a two-tailed test is 3. Hence H0 can be rejected. Conclude H0 is incorrect : drink does affect reaction time (increases). 3.7.2 Large Samples For samples > 20, you cannot use the Table 1. However, it can be shown that the sum of the ranks, T , is normally distributed with: mean, µT and standard deviation, σT z = T − µT σT N (N + 1) 4 r N (N + 1)(2N + 1) = 24 = = T − q N(N+1) 4 N(N+1)(2N+1) 24 is approximately normally distributed with zero mean and unit variance. Thus, tables of z give the probabilities associated with the occurrence under H0 of various values as extreme as an observed z computed from the above formula. Wilcoxon signed-rank test with M INITAB Type the data into two columns, say C1 and C2. Form a column of differences. LET C3 = C2 - C1 WTEST 0 C3 40 The output from this command includes T and the exact probability of the observed ranking under a one tailed null hypothesis. Double the probability for a two-tailed test. 41 42 Chapter 4 Analysis of Variance 4.1 Introduction So far, we have looked at tests of statistical significance designed to compare two sample means. Analysis of variance (or A NOVAR) is used when we want to compare more than two sample means, or to investigate the way in which several treatments interact with each other. For example, if we extend the example used to illustrate the Student’s t-test to five drugs and hence five groups of subjects, instead of just two, we would use One-way A NOVAR to analyse the five means. Why can’t we just carry out t-tests between each pair of means to find out which differ from which? Consider the number of tests you would make. There are ten pairwise comparisons that you could make amongst five means. Let us suppose that in fact there are no significant differences between the means. Each of the t-tests would have a chance of 0.05 of showing a significant result by chance (a so called type 1 error). But we are going to carry out ten such tests in this single experiment; so the chance of at least one of them being significant by chance is considerably more than 0.05. The correct procedure is to carry out a single analysis of variance on all the data and only after that start to look at individual comparisons. 4.2 Within Sample Variance and Between Sample Variance Let us consider a simple example, that of three samples taken from normally distributed populations with the same variance. Our null hypothesis is that the samples come from the same population with mean µ. 43 For sample 1, n = 4 and the sample mean = x̄1 For sample 2, n = 5 and the sample mean = x̄2 For sample 3, n = 5 and the sample mean = x̄3 The total variance of the 14 observations making up the three samples is made up of two components; 1. The variance due to the difference between three sample means, (x̄1 , x̄2 , x̄3 ) and the population mean, µ, i.e. the between sample variance 2. The variation due to differences within the samples, i.e. the deviation of the four values of x in sample 1 from x̄1 , and the deviation of the values of x in sample 2 from x̄2 and the deviation of the values of x in sample 3 from x̄3 , the within sample variance. If all three samples are drawn from the same population we would expect the within and between sample variances to be approximately equal. Our test statistic is thus the between sample variance divided by the within sample variance. This variance-ratio is known as the F -ratio after R. A. Fisher who developed A NOVAR. The expected value of the F -ratio under the null hypothesis is unity. If the three sample means differ then the between sample variance will exceed the within sample (error) variance and the F -ratio will be more than one. In the previous chapter we used an F -test to compare the within sample variance of one sample with the within sample variance of another sample in order to check that they were the same, before carrying out a t-test. Here, although we are also comparing variances, a high F -ratio indicates that there is a difference between the sample means. This is because we are comparing the pooled within sample variance with the between sample variance, rather than two within sample variances. If the difference between the two variances is such that the F -ratio exceeds a critical value we will conclude that the between sample variance is greater than we would expect from the observed within sample variance. We might therefore be lead to reject the null hypothesis i.e. to conclude that the three samples were not in fact drawn from the same population and differed in their means. 4.2.1 Example Fourteen rats from three strains were given the same dose of a hypnotic drug, and the sleeping time of each animal was measured. Do the strains differ in sleeping time? 44 Sleeping Time (min) strain 1 strain 2 strain 3 (x1 ) (x2 ) (x3 ) 13 16 19 16 P x1 = 64 x̄1 = 16.0 20 17 23 26 19 P x2 = 105 x̄2 = 21.0 Overall mean, x̄ = 14 21 16 19 13 P x3 = 83 x̄3 = 16.6 64 + 105 + 83 14 = 18.0 Total Sum of Squares We can compute the total sum of squares about x̄: x1 13 16 19 16 x̄ − x1 5 2 −1 2 (x̄ − x1 )2 25 4 1 4 P (x̄ − x1 )2 = 34 x2 20 17 23 26 19 x̄ − x2 (x̄ − x2 )2 −2 4 1 1 −5 25 −8 64 −1 1 P 2 (x̄ − x2 ) = 95 x3 14 21 16 19 13 x̄ − x3 (x̄ − x3 )2 4 16 −3 9 2 4 −1 1 5 25 P 2 (x̄ − x3 ) = 55 Total sum of squares about x̄ = 34 + 95 + 55 = 184 Degrees of freedom = No. of samples−1 = 14 − 1 = 13 Within Sample Sum of Squares (also called the error sum of squares) 1. Find the squared deviation of each observation from its sample mean. 2. Sum the squared deviations from each sample. x1 13 16 19 16 x̄1 − x1 3.0 0.0 −3.0 0.0 P (x̄1 − x1 )2 9.0 0.0 9.0 0.0 (x̄1 − x1 )2 = 18.0 x2 20 17 23 26 19 x̄2 − x2 (x̄2 − x2 )2 1.0 1.0 4.0 16.0 −2.0 4.0 −5.0 25.0 2.0 4.0 P (x̄2 − x2 )2 = 50.0 45 x3 14 21 16 19 13 x̄3 − x3 (x̄3 − x3 )2 2.6 6.76 −4.4 19.36 0.6 0.36 −2.4 5.76 3.6 12.96 P (x̄3 − x3 )2 = 45.2 Within sample SS = 18.0 + 50.0 + 45.2 = 113.2 Degrees of freedom = No. of observations − No. of groups = 14 − 3 = 11 Between Sample Sum of Squares 1. In effect each individual is replaced by its sample mean (thus removing within sample variation). 2. The between sample sum of squares is the difference of all these values from the overall mean (x̄), squared and summed. x̄ − x̄1 (x̄ − x̄1 )2 N (x̄ − x̄1 )2 strain 1 (x1 ) 18 − 16.0 = 2.0 4.00 4 × 4.00 = 16.0 x̄ − x̄2 (x̄ − x̄2 )2 strain 2 (x2 ) 18 − 21.0 =-3.0 9.0 N (x̄ − x̄2 )2 5 × 9.0 = 45.0 x̄ − x̄3 (x̄ − x̄3 )2 strain 3 (x3 ) 18 − 16.6 = 1.4 1.96 N (x̄ − x̄3 )2 5 × 1.96 = 9.8 Thus, Between sample SS = 16.0 + 45.0 + 9.8 = 70.8 Degrees of Freedom = No. of groups−1 = 3 − 1 = 2 We can then summarise the analysis of variance in the following table, in which the within and between sample variances are calculated by simply dividing the sum of squares by the degrees of freedom. SS df variance F p≥F Between samples Within samples (error) 70.8 113.2 2 11 35.4 10.291 3.4 0.069 Total 184.0 13 Source Then F = 35.4 = 3.4 (df = 2, 11) 10.291 46 The probability of F2,11 ≥ 3.4 is 0.069. We can conclude that the three strains are drawn from the same population, and therefore that the difference between the three means is due to chance. Note: Since the within sample and between sample sum of squares add up to the total sum of squares it is only necessary to calculate the total sum of squares and one of the others. If you were doing these calculations by hand it would be easiest to obtain the between sample sum of squares, obtaining the error sum of squares by difference. Graphical Illustration of One-Way Anovar • Choose ANOVA and regression Statistics menu. from the • Choose ONEWAY. • Answer Rats to filename? This loads the data of the example you have just worked through. Answer 1 to Number of response variable?. Answer 2 to Number of factor variable?. Answer strain 1 for level 1, strain 2 for level two and strain 3 for level 3. • The data are displayed. You will see the data for the three strains displayed in different boxes, with their respective means shown as a horizontal yellow line. • Now go through the previous section again in conjunction with the graphical demonstration. Just press Enter to start. Note that in this demonstration ms stands for mean square, an alternative term for variance. • Press Escape followed by any key to exit. 47 4.2.2 Exercise The irritant activity of 5 drugs, A, B, C, D and E was tested by applying each drug in solution to the eye of a rabbit. The number of blinks occuring over the following minute was recorded. The table shows the number of blinks obtained in 6 different animals for each drug. A Drug B C D E 3 5 5 2 4 5 8 6 7 8 9 7 7 2 4 5 6 6 2 3 4 3 3 5 6 3 5 5 2 4 Carry out an analysis of variance on these results. Use a calculator to help, but carry out each step separately, and record your answer at each stage. Total sum of squares P 1. N x P (x̄ − x)2 – find the grand mean for all the observations 2. (x̄ − x) – find the difference between each observation and the grand mean 3. (x̄ − x)2 – square each of these differences P 4. (x̄ − x)2 – add all the squared values 48 Group x A 3 5 5 2 4 5 B 4 6 7 6 9 7 C 2 3 4 3 3 5 D 6 3 5 5 2 4 E 8 2 4 5 6 6 (x̄ − x) Total sum of squares (x̄ − x)2 P (x̄ − x)2 = 49 Between sample sum of squares 1. x̄m – find the mean for each sample 2. (x̄ − x̄m ) – find the difference between each sample mean and the grand mean 3. (x̄ − x̄m )2 – square each of these differences 4. N (x̄ − x̄m )2 – multiply the squared difference by the number of observations in each sample P 5. N (x̄ − x̄m )2 – add the values from each of the groups A B C D E 3 5 5 2 4 5 8 6 7 8 9 7 2 3 4 3 3 5 6 3 5 5 2 4 7 2 4 5 6 6 x̄m = (x̄ − x̄m ) = (x̄ − x̄m)2 = X N (x̄ − x̄m)2 = N (x̄ − x̄m)2 = Summary of Analysis of Variance: Source SS df Between samples Within sample (error) Total between samples variance F = within sample variance df = = Is the Null hypothesis accepted or rejected? 50 Variance estimate F 4.3 Comparison of Means You will see that analysis of variance tells us whether or not a difference exists, but not where the difference is. To locate the significant differences, – if it is not obvious by eye – a further test must be applied e.g. • Fisher’s Least Significant Difference (LSD) • Tukey’s Honestly Significant Difference (HSD) • Dunnett’s Multiple Range Test 4.3.1 Fisher’s Least Significant Difference (LSD) Fisher’s LSD is equivalent to carrying out a t-test between each mean and every other mean, except that the error variance used in the calculations is taken from the ANOVAR and is therefore more accurate, since it is based on all the data, not just the data from two treatments. At the start of this chapter it was pointed out that there is a danger of identifying false significant differences if lots of t-tests are carried out. However if the ANOVAR shows that there are at least some significant differences (ie the F-ratio is significant) Fisher’s LSD can be used to identify those differences, provided that unclear conclusions are treated with caution. Fisher’s LSD is the standard error of the difference between two means multiplied by t. Any two means which differ by more than this can be considered significantly different. The variance of the difference between two means is twice the variance of the means (see Chapter 3). And the variance of any mean is s 2 /n, where s 2 is the residual variance and n is the sample size. So Fisher’s LSD is calculated as follows s LSD = t 2 s2 n An alternative form which allows for means based on different sized samples is s LSD = ts 4.3.2 1 1 + ni nj Tukey’s Honestly Significant Difference Fisher’s LSD is easy to understand and easy to apply but some people feel that is not conservative enough. A very conservative alternative is Tukey’s Honestly Significant Difference. This test takes into account the number of possible comparisons you might make. The more groups you have the wider will be the difference 51 between any two means which will be considered significant. If you have many means and you are “trawling” for all and any significant difference, this test will guard against false positive conclusions. √ Tukey’s HSD replaces t with Q/ 2. This gives the same result if there are only two means but gives a larger value as the number of groups increases. s Q 1 1 R=√ s + nj 2 ni Where • R = a minimum significant range • Q = a preliminary factor obtained from tables, and related to the number of samples (a) and number of degrees of freedom (f ) for error variance. • s = error standard deviation (Within sample standard deviation) • n = number of observations per sample 4.3.3 Dunnett’s Multiple Range Test Using Minitab The third Multiple Range test, Dunnett’s Test, can be applied when the only comparisons to be made are between a number of treatments and a control. The change in blood pressure of volunteers was measured following the administration of new experimental drugs. Are any likely to be dangerous if given to patients with high blood pressure? Which might be beneficial? drug 1 (placebo) drug 2 drug 3 drug 4 3 −2 −1 5 0 −5 −20 −17 −12 −21 20 22 18 25 19 3 1 2 0 5 Table 4.1: Change in blood pressure (mm Hg) Enter Minitab and type the data into columns C1–C4 read c1-c4 Now stack C1–C4 into column C5 and create a column C6 which indicates which rows of C5 belong to which treatment. 52 stack c1-c4 c5; subscripts c6. First have a look at the data using the boxplot command. boxp c5; by c6. The following plot is produced. C6 -----I+ I------ 1 2 ---------I + I------------------I+ I------ 3 4 ---I+I---------+---------+---------+---------+---------+---------+C5 -20 -10 0 10 20 30 It is clear from this that drug 2 has the effect of reducing blood pressure compared to the placebo. Drug 3 increases it and drug 4 has no effect. There is no need to carry out a full Analysis of variance where the conclusions are as clear as this, but we will do it anyway. Minitab has several Multiple Comparison Tests, including fisher, tukey and dunnett as subcommands of the oneway command. We are comparing the other drugs to drug 1, which is a placebo, hence Dunnett’s Test is apropriate. The Minitab subcommand dunnett 1 indicates that level 1 of C6 (ie drug 1) is the control. oneway c5 c6; dunnett 1. 53 The dunnett subcommand gives the following output. Dunnett’s intervals for treatment mean minus control mean Family error rate = 0.0500 Individual error rate = 0.0196 Critical value = 2.59 Control = level 1 of C6 Level 2 3 4 Lower -22.521 13.279 -5.321 Center -16.000 19.800 1.200 Upper ---------+---------+---------+---------+--9.479 (-----*----) 26.321 (-----*----) 7.721 (----*----) ---------+---------+---------+---------+--12 0 12 24 Drug 2 differs from the placebo by −16 mmHg. The 95% confidence interval for this difference is −22.521 to −9.479, clearly a significant reduction in blood pressure. Similarly drug 3 increases blood pressure. However the range for drug 3 includes zero and is therefore not significantly different from the placebo. 54 Chapter 5 Correlation and Regression 5.1 Introduction Correlation and Regression analysis are both concerned with the relationship between variables but they are used in different circumstances. Correlation is less useful but is simpler and we will discuss it first. Consider a group of students for each of which we have a school exam mark and a university exam mark, for a particular subject. Suppose we are interested in knowing whether there is any relationship between the two sets of grades. We can calculate a quantity called the correlation coefficient which measures the strength of the relationship between the two sets of results. The value of the correlation coefficient can range from 0, if there is no relationship between the two sets, to 1, if there is a perfect positive correlation between the two, or −1, if there is a perfect negative correlation; that is students with the best school grades have the worst university results. More often the calculated correlation coefficient lies between −1 and +1 and we have to decide whether the observed value is just a matter of chance, or, in statistical jargon, whether the correlation coefficient is statistically significant. Here is a second example of a situation where we are interested in the relationship between two variables. We wish to calibrate an ultra-violet spectrometer. We have made up a set of solutions of known concentration and have measured the absorbence at each concentration. We would like to use this information to estimate the concentration of an unknown solution from its absorbence. There are important differences between this situation and the previous example. To start with the information we want from the data is different. We are not just interested in whether there is a significant relationship between absorbence and concentration. (we would hope there was). We want to know the form of this relationship and we would like to describe it mathematically. This would allow us to substitute the measured absorbence of an unknown sample into the formula to obtain a predicted concentration. 55 The other difference is the control we have over the values of one of the variables. We have made up the standard solutions to known concentrations. We would probably choose concentrations at regular intervals over the range we are interested in so that we could estimate the concentration of any unknown falling in that range with equal precision. In practice such a calibration might be more complicated than this but the important point is that we have control over one variable and we can measure it accurately. This variable is known as the independent variable. The other variable (here absorbence) is dependent on concentration and is called the dependent variable. We do not know the value of our dependant variable until we do the experiment. The mathematical technique used to quantify the relationship between a dependant variable and an independant variable is known as regression analysis. It results in a mathematical model of the relationship which, in linear regression, is the equation of a straight line. The analysis furnishes estimates of one or more parameters which characterise the equation. Other examples of regression are i) dose of drug and response and ii) the extent of a chemical reaction with time. Compare these examples with the exam result example. There we had a sample of people on which we had taken two measurements. We had no control over the values of either variable. Moreover there would be considerable error associated with each exam result. If it were possible to give the same person the same exam many times we would get a different result each time. This contrasts with regression where one variable is usually under our control and is measured accurately. It is important to be clear about the difference between correlation and regression because it is usually invalid to calculate a correlation coefficient and quote its significance when regression is properly involved. This is because, as we will see, the calculation of a correlation coefficient involves the assumption that both variables are normally distributed. Where one variable is under our control this will not usually be the case. It is also dangerous to use a regression method where the independant variable is subject to considerable error of measurement, although regression can often be used where the independant variable is not strictly under our control if we happy that it is measured accurately. Summary Correlation The value of neither variable is under the control of the experimenter. Both variables usually associated with error of measurement. Correlation analysis is used to assess whether the two variables are associated. Regression One variable, the independent variable, is measured accurately and its value is often chosen by the experimenter. The other variable, called the dependent variable, may be subject to experimental error, and its value depends on the value chosen for the independent variable. The line fitted to the points is called a 56 regression line, and is described by a regression equation which contains one or more parameters. Regression analysis is used to predict one variable from another. 5.2 Some useful formulae In this chapter the following abbreviations will be used. ssx Total sum of squares for x, P (x − x̄)2 P ssy Total sum of squares for y, (y − ȳ)2 P sxy Sum of cross products, (x − x̄)(y − ȳ) Each of these has an associated computational formula which provides a quicker and less error prone method of calculating these quantities if you are using a calculator. These are as follows: X P x)2 Pn 2 X ( y) ssy = y2 − Pn P X ( x. y) sxy = xy − n ssx = 5.3 x − 2 ( (5.1) Product-Moment Correlation Coefficient. Suppose we have a sample of n individuals each of which has been measured for two variables x and y. If the variables have been measured on an interval scale and both are normally distributed then the degree of association between the two variables can be assessed by calculating the correlation coefficient (r). The basic quantities we need to calculate to do this are the following: n P x P 2 x P y P 2 y P xy Substituting these into the equations 5.1 gives us the sums of squares for x and y and the sum of cross products. The correlation coefficient, r, is then calculated as : sxy r=√ ssx × ssy 57 Fisher showed that r is related to Student’s t as follows s (n − 2) t =r (1 − r 2 ) (5.2) with n − 2 degrees of freedom. Here n is the number of data pairs. So the exact probability of a value of r can be found by converting to t and looking up the probability of t. Alternatively, tables of critical values of r for a range of values of degrees of freedom are published. 5.3.1 Correlation coefficient: Example The data in table 5.1 were collected in a medical study on the blood concentrations of different proteins in a group of males aged between 40 and 60. We are interested in whether there is a link between the levels of these proteins. The existence of a link would help to identify particular biochemical reactions which may be taking place in these patients. All measurements were made in mg ml−1 and then converted to a loge scale. 58 Testosterone 5.85 5.91 6.20 6.39 6.63 6.63 6.32 6.30 6.20 6.41 6.40 5.89 6.43 6.48 5.83 6.12 6.23 6.36 6.20 6.49 5.96 SHBG 3.50 3.81 3.89 3.14 3.14 3.09 2.64 3.37 3.40 3.26 2.94 3.30 3.00 3.00 3.81 3.47 3.58 3.53 3.33 3.56 3.64 Table 5.1: Protein Data 59 • Enter M INITAB. • Retrieve ’TESTOST’ • The data from table 5.1 have been entered in columns C1 - C2. • Plot C1 against C2. This scatter plot is very useful for assessing the presence of a relationship between two variables visually. Any gross departure from normality, in either variable, will be also be obvious from this plot. In this case the scatter plot suggests that there might be some negative correlation between the two variables, but this is partially obscured by random variation. • Calculate the correlation coefficient . . . CORR C1 C2 • You should get a value of −0.591. • Using equation 5.2 this can be converted to t . . . s 19 t = −0.591 = −3.19350 (1 − 0.5912 ) • In M INITAB this can be calculated as follows . . . LET K1 = -0.591 LET K2 = K1 * SQRT(19/(1 - K1**2)) PRINT K2 Giving a t value of −3.19 with 19 degrees of freedom, the probability of which can be found CDF 3.19 K1; T 19. LET K1=1-K1 PRINT K1 Showing that the probability of a t > 3.19 = 0.0024. So the probability of t > 3.19 or t < −3.19 = 2 × 0.0024 = 0.0048. Strong evidence of a negative correlation between Testosterone and SHBG. 60 5.4 Spearman’s Rank Correlation Coefficient - rs This coefficient, like other methods based on ranks, does not depend on assumptions about normal distributions. As with Pearson’s product-moment correlation, a value of +1 corresponds to perfect correlation between x and y, a value of 0 corresponds to no correlation and a value of −1 corresponds to a perfect negative correlation (y ↓ as x ↑). However, what is meant by “perfect correlation” is not the same for different coefficients. For ranking coefficients, it means that the ranking of individuals is the same for both criteria, i.e, concordance of ranking. Spearman’s Rank Correlation Coefficient may be used for ordinal or interval scale data. 5.4.1 Procedure To compute rs , make a list of the n subjects. Next to each subject’s entry, enter his rank for the x variable and his rank for the y variable. Determine the various values P 2 of the difference between the two ranks (di ). Square each di , then calculate (di ). rs P 6 × (di2 ) = 1− n × (n2 − 1) Table 3 gives critical values of rs for n up to 12. For n > 12 use the following approximation to the t-distribution with n − 2 degrees of freedom. Do not forget to double the probability for a two-tailed test. s t = rs n−2 1 − rs2 Calculating Spearman’s rs with M INITAB READ RANK RANK CORR C1 C1 C2 C3 C2 C3 C4 C4 Enter data in C1 and C2 Rank Correlation Example Rarity of doctors in ith district (yi ) and the number of days lost due to illness (xi ) in that area. 61 H0 : These two factors are not associated H1 : The two factors are associated. Two-tailed test Area 1 2 3 4 5 6 Score xi yi 160 165 169 175 180 186 59 54 64 66 85 78 Rank xi yi 1 2 3 4 5 6 2 1 3 4 6 5 P di di2 −1 +1 0 0 −1 +1 1 1 0 0 1 1 (di2 ) =4 rs = 1 − 6×4 6(36 − 1) = 0.886 From Table 3 the probability of rs = 0.886 (n = 6) is exactly 0.05. Conclude: reject H0 at p = 0.05. Lack of doctors and incidence of illness are connected. 5.5 Linear Regression Analysis As discussed in the introduction, variables such as age, time, drug dose etc. are independent variables. By convention these independent variables are plotted on the horizontal, or x-axis, of a plot and are thus known as x variables. The dependent variables, those which may be related to the independent variables, are plotted on the vertical or y-axis, and are known as y variables. In regression the x variable is not usually normally distributed. Regression analysis does, however, assume a normal distribution for y for any given value of x; moreover the variance of y should be independent of x. For example, the scatter about the fitted line should not increase as x increases. This assumption, that the variance of the distribution of y for any given x, does not change over the range of x, is frequently invalid, especially when y is a biochemical variable such as a hormone concentration. You should always plot your data and look at it, before doing a regression. This allows you to assess visually whether there may indeed be an association between the variables, and if so, whether it is in the form of a straight line or a curve. It also lets you check that the scatter of y is not related to x. The computer will always give you an 62 answer, but it may be meaningless! If the relationship is non-linear or if the variance of x increases with y, a common situation, it is usually possible to transform one or both variables before carrying out regression analysis. An example of data transformation is given in Example 2. After a model is fitted the observed points will not necessarily lie on the fitted line. The scatter about the line will depend on how good the model is. If we draw a vertical line from each observed point to the fitted line, square all these distances and add them up we obtain the residual sum of squares. The expression “fitting a line to the observed points” means the process of finding estimates of the slope and intercept that result in a calculated line which fits the observations “best”. By “best” we mean the estimates which minimise the residual sum of squares. Hence this method is called the method of least squares. • Select program SCATTER regression menu from the ANOVA and • Press Enter to accept the default data set (Protein) • A scatter plot of protein level against Gestation period is shown. • Press Enter to Display regression model. A horizontal red line represents the currently fitted regression model. The equation of this model is shown above the plot. Initially this is simply y = 0.5, i.e. y = 0.5 for all values of x. • Press R. A vertical line is drawn from each data point to the fitted line. The sum of all these distances squared (ss) is shown next to the equation and this represents the the mismatch between the data and the model, that is the residual sum of squares. The best possible model will by such that these residuals are as low as possible. By pressing the four arrow keys you can move and rotate the fitted line. At the same time the equation for the current line and the residual sum of squares will be shown. Try and find the optimum fit, that is the fit with the lowest possible residual sum of squares. • When you think you have done this make a note of your fitted equation, then check your answer by pressing F for the best fit model calculated mathematically as below. 63 5.5.1 Regression by hand To perform a regression analysis on paper first calculate the quantities x̄, ȳ, ssx, ssy and sxy (equations 5.1) Calculate the slope (b) and the intercept (a): sxy ssx a = ȳ − bx̄ b = If you need to test whether your model explains a significant proportion of the total variation in y the best approach is to partition the sum of squares of y into two parts, the part which is explained by the model and the part which is not ie. the residual variation about the line. These sums of squares can be laid out in an analysis of variance table. An F -test between the residual and the regression variances will tell you whether the fitted model helps to account for a significant proportion of the variation in y. See the example below. In some cases you know beforehand that there is a good relationship between x and y and all you need to know is the value of the slope and intercept. In this case an analysis of variance would not be appropriate. Total sum of squares for y (ssy) has n − 1 degrees of freedom. The regression sum of squares is given by . . . rss = sxy 2 ssx with 1 degree of freedom. The residual sum of squares is best obtained by subtracting the regression sum of squares from the total. This has n − 2 degree of freedom. Confidence limits for the parameters. The standard error of b (seb) is: s seb = s2 ssx where s 2 is the residual variance. 95% confidence limits for b where n is large would be b ± 1.96 × seb. Where n is less than, say, 30 use t with n − 2 degrees of freedom rather than 1.96. 64 The confidence limits for y at any value of x can be calculated as follows: s 1 (X − x̄)2 2 standard error of Y (seY ) = s × +P (x − x̄)2 n where Y is the value of y at a particular X and s 2 is the residual variance. The 95% confidence limits for Y are then Y ±1.96×seY . Again where n < 30 use t with n − 2 degrees of freedom. Note that these limits will be narrowest at the mean value of x and will get wider the further X is from x. The confidence limits for the intercept (a) are the confidence limits for y when x = 0. 5.5.2 Assumptions involved in linear regression by least squares There are five assumption involved in fitting a linear regression model by the method of least squares. These are 1. The relationship between x and y is linear. 2. The variability of the errors is constant for all values of x. 3. The errors are independent. 4. The errors are normally distributed. 5. The values of the explanatory variable (x) are measured without error. These assumptions should all be considered before carrying out a regression and the best way to assess the first four is by looking at a scatter plot of the data. Notice that three of the assumptions involve the errors (or residuals). A plot of the residuals against x (or often the fitted values for y) is known as a residual plot. This is a very useful way to spot patterns in the residuals which indicate violation of the above assumptions. Let’s consider each of these assumptions in turn. The relationship between x and y is linear This is the most obvious assumption, note, however, that we are only assuming that the relationship is linear over the range of our data. We have no information on the relationship outside this range and it is therefore unwise to extrapolate to values outside the range of our data. Enter M INITAB and retrieve the worksheet nonlin. Plot C1 against C2. retr ’nonlin’ plot c1*c2 You will be able to see a certain amount of curvature in the scatter plot. This is made clearer if the residuals are plotted against C2. The regr command carries out a linear regression and stores the residuals (vertical deviations from the fitted line) in a column called resid. The fitted values are stored in ’fits’. resid is then plotted against C2. 65 name c3 ’resid’ c4 ’fits’ regr c1 1 c2 ’resid’ ’fits’ plot ’resid’*c2 The residuals are predominantly negative for extreme values of x. The residual plot highlights this pattern by removing the trend due to the regression line. Clearly a linear model is not the best in this case. The solution might be to fit a more complex non-linear model or to transform one or both of the variables by taking logs, square roots etc. which often has the effect of linearising the relationship. An example of transformation is given in example 2. The variability of the errors is constant for all values of x. Open the M INITAB worksheet ’noncon’ and plot C1 against C2, as above. Notice how the spread of the data points increases as the value of x increases. This is a common phenomenon, large measurements tend to have more error. A log transformation of y will usually make the residuals more uniform with respect to x. The errors are independent. This means that the value of a particular residual has no effect on the value of the next residual, or on any other residual. This assumption is frequently violated where a particular item is being monitored in time. If the measurement at time t has a particularly large value then the value at time t + 1 will also tend to be large because the the value at time t is the starting point for the subsequent changes. Plot C1 against C2 from the M INITAB worksheet ’nonind’. Notice how the measurements seem to wander from one side of the fitted line to the other. The errors here are not independent. It has been shown that correlated errors, or autocorrelation as it is known, does not effect the estimates of the parameters of the linear model but it does have the effect of leading to underestimates of the residual error. This means that the calculation of confidence limits or statistical comparisons of lines will not be valid. See example 5.5.4 for more discussion of autocorrelations. The errors are normally distributed. This assumption is not usually a problem. The values of the explanatory variable (x) are measured without error. assumption is not one that we can check by examining the data. 66 This 5.5.3 Regression Example 1 The table below summarises data on the level of a protein in expectant mothers throughout pregnancy. x2 y2 xy 0.38 0.51 0.58 0.84 0.78 0.65 0.83 0.84 0.92 0.92 121 169 289 361 484 729 841 961 1156 1296 0.1444 0.2601 0.3364 0.7056 0.6084 0.4225 0.6889 0.7056 0.8464 0.8464 4.18 6.63 9.86 15.96 17.16 17.55 24.07 26.04 31.28 33.12 7.25 6407 5.5647 185.85 n Time into pregnancy (weeks) Protein level (mg ml−1 ) 1 2 3 4 5 6 7 8 9 10 P 11 13 17 19 22 27 29 31 34 36 239 0.80+ 0.60+ 0.40+ - level * * * * * * * * * * --+---------+---------+---------+---------+---------+----time 10.0 15.0 20.0 25.0 30.0 35.0 67 Hand calculations ssx = ssy = sxy = rss = residss = x̄ ȳ = = 239 10 7.25 10 = 23.9 = 0.725 x2 − = 2392 10 7.252 5.5647 − 10 185.85 − 239×7.25 10 12.5752 694.9 = 694.9 P ( x)2 P 2 (Pny)2 y − P nP P xy − ( xn y) sxy 2 ssx P ssy − rss = = 6407 − = 0.30845 = 12.575 = = 0.227559 = 0.30845 − 0.22756 = 0.080891 Here rss is the regression sum of squares and residss is the residual sum of squares. Analysis of variance df SS variance 0.22756 0.01011 Regression Residual 1 8 0.227559 0.080891 Total 9 0.30845 F Prob> F 22.51 0.0015 If time did nothing to explain protein levels we would expect the regression variance and the residual variance to be estimates of the same quantity and thus F to equal 1. A higher value may be a chance event or it might indicate that time does help to predict protein levels. If we look up the critical value for F at the 1% confidence level with 1 and 8 degrees of freedom in tables we get a value of 11.26, smaller than our value. Using tables we can only say that the chance of getting an F -ratio as high as this is less than 0.01. Using a Statistics package we are given the exact probability of the observed F -ratio which is 0.0015. We conclude that a significant proportion of the variation in protein level is explained by time from pregnancy. In fact if we express the regression sum of squares as a percentage of the total we get a value of 100 × 0.227559/0.30845 = 74%. In other words 74% of the sum of squares of protein level is accounted for by fitting the model. This is the coefficient of determination. As it happens the coefficient of determination can also be obtained by squaring the correlation coefficient between x and y and for that reason it is usually abbreviated to r 2 . The parameters of the model, the slope (b) and the intercept (a) are obtained as follows. b = sxy/ssx = 12.575/694.9 = 0.0181 68 a = ȳ − bx̄ = 0.725 − 0.0181 × 23.9 = 0.2925 Thus the equation describing the relationship between time (t) and protein level (P ) is: P = 0.2925 + 0.0181 × t This tells us that an increase of one week in pregnancy is associated with an average increase in protein level of 0.0181 mg ml−1 Confidence limits for b (the slope) are found as follows: Here s 2 is the residual variance from the Analysis of Variance table. s seb = r s2 ssx 0.01011 694.9 = 0.0038 = n is less than 30, so we need to multiply seb by t with n −2 degrees of freedom to get 95% confidence limits. From a table of “Student’s t-distribution” you will see that the value of t in the 0.05 column corresponding to 8 degrees of freedom is 2.306. If we wanted, let’s say, 99% confidence limits we would look in the 0.01 column and multiply by t = 3.355. 95% confidence limits = b ± (0.0038 × 2.306) b = 0.0181 ± 0.00876 Confidence limits for a (the intercept): s sea = s2 × s = 1 x̄ 2 + n ssx 0.01011 × 1 23.92 + 10 694.9 = 0.0966 95% confidence limits = a ± 2.306 × 0.0966 a = 0.2925 ± 0.223 69 Regression with M INITAB Enter M INITAB and retrieve the worksheet protein. retr ’protein’ This worksheet contains the time into pregnancy and protein data discussed above in columns C1 and C2. Plot C2 against C1 and consider the five regression assumptions. No violations are obvious from the plot. plot c2*c1 Model C2 by fitting the one variate C1. name c3 ’resid’ c4 ’fits’ regr c2 1 c1 ’resid’ ’fits’ The ANOVAR table is shown, including Probability > F . Parameter estimates with their standard errors are also shown. To produce a high resolution plot showing the data and the fitted line, use the %regplot macro to produce a plot. %regplot c2 c1 ’fits’ Type help regr and help plot for a full explanation of these commands. 5.5.4 Regression Example 2 Various compounds were run through a High performance liquid chromatography (HPLC) system and the time taken to drain through the column (‘capacity factor’ – K 0) was recorded. It is thought that K 0 might be related to the ‘partition coefficient’ (P ) of the compound. The data are given below. 70 Compound p-aminophenol Anisole Benzamide Benzonitrile o-cresol p-cresol 2, 4-dimethylphenol Methyl-4-hydoxy benzoate Methylsalicylate Phenol p-phenylphenol Capacity factor K 0 (mins) 1.59 10.00 2.40 7.08 6.17 6.17 11.20 8.13 18.20 3.47 38.02 Partition Coefficient P 1.1 128.8 4.4 36.3 91.2 87.1 199.5 91.2 288.4 28.8 1585.0 • Enter M INITAB, open the worksheet ’hplc’ RETR ’hplc’ Column C1 contains the ‘capacity factor’, (K 0 ). C2 contains the ‘partition coefficient’ (P ). • Plot K 0 against P . This plot looks rather odd, with one observation (p-phenylphenol) near the top right and all the others crammed into the bottom left. Not only are the values for p-phenylphenol much larger than the rest but it does not seem to be consistent with the relationship between K 0 and P for the other compounds. 71 • Fit the line and plot it with name c3 ’res’ c4 ’fits’ regr c1 1 c2 ’res’ ’fits’ %regplot c1 c2 ’fits’ You can see that the fitted line is very strongly influenced by the position of the ‘outlier’. • When data are skewed in this fashion the log transformation often provides a more useful scale of measurement. Log transform both variables and replot the data. let c1=log(c1) let C2=log(c2) plot c1*c2 The resulting plot is a well behaved linear relationship between log10 K 0 and log10 P . On this scale of measurement p-phenylphenol is quite consistent with the other observations. • Calculate the parameters of the transformed regression model and write down the model you have fitted in terms of K 0 and P . 72 5.5.5 Regression Example 3 • Enter M INITAB and open the ’nonind’ worksheet. • Fit a line and look at the plot. name c3 ’res’ c4 ’fits’ regr c2 1 c1 ’res’ ’fits’ %regplot c2 c1 ’fits’ You have looked at this example before. It should be clear that the errors are not independent. M INITAB has a useful command for checking non-independence of residuals (autocorrelation). If the residuals are stored and the nth residual is plotted against the n + 1th residual you would expect a random scatter of points unless a particular error tends to be similar to the previous one, in which case most of the points will be in the upper right and lower left quadrant. • The residuals are stored in column ’res’. Create two columns in which the residuals are out of phase by one. Plot these against each other. lag ’res’ c5 plot ’res’*c5 Notice how most of the residuals are in the top left or bottom right quadrant. This is because a high residual tends to be followed by another high residual and a low one tends to be followed by a low one. The autocorrelation function (ACF) measures the correlation between lagged residuals. In this case we have looked at the ACF for lag one. The M INITAB command ACF calculates the ACF for lags going from 1 upwards and plots these against lag number. • Make an autocorrelation plot (correlogram). acf c3 In this case there are autocorrelations for lag 1 and negative autocorrelations at lag 8. As noted previously, the presence of autocorrelation does not mean that your parameter estimates will be biased but it does mean that your estimates of the residual variance will be wrong. Hence any calculations, such as confidence limits, 73 which depend on the residual variance can not be carried out. The analysis of data sets with strong autocorrelation is outside the scope of this course but you should at least be aware if your data shows this pattern. As noted previously, the presence of autocorrelation does not mean that your parameter estimates will be biased but it does mean that your estimates of the residual variance will be wrong. Hence any calculations, such as confidence limits, which depend on the residual variance can not be carried out. The analysis of data sets with strong autocorrelation is outside the scope of this course but you should at least be aware if your data shows this pattern. 5.5.6 Exercise The following data were obtained for the ability of liver slices from guinea pigs of different ages to conjugate phenolphthalein with glucuronic acid. Check the regression assumptions and calculate the equation of the regression line quoting 95% confidence limits for each parameter. age(days) moles conjugated 0 12 29 43 63 76 85 93 8.98 8.14 6.67 6.08 5.83 4.68 4.2 3.72 74 Chapter 6 Analysis of Counts in Contingency Tables 6.1 Introduction We are often interested to know whether there is any association between attributes which are categorical rather than continuous. For example we might need to know whether there is an association between eye colour and hair colour in a sample of people. If we have four eye colours and four hair colours we can form a 4 × 4 table (16 cells) and count the number of people which fall into each cell. Such a table is known as a contingency table. If there were an association between hair colour and eye colour subjects would tend to occur in particular cells of the table. For example, more subjects than expected might occur in the fair hair–blue eye category. If there were no association we would expect the proportion of subjects of various eye colours to be the same for each hair colour, apart from chance differences of course. A statistic called Chi-Squared (χ 2 ), which measures the degree of association between the two categorical variables, can be derived from the contingency table. 6.2 Example In a study, 290 people were selected at random and asked about their preferences amongst various types of tablet. Table 6.1 summarises part of the information and relates the age of the interviewees to their preferred colour of tablets from the range pink, orange and white. On the basis of these data, can it be concluded that there is a link between age and colour preference in the population as a whole? The first step is to form the column and row totals and to use these to calculate the counts in the body of the table which we would expect if there were no association between age and colour preference. These expected values correspond to the null hypothesis. 75 Table 6.1: Contingency table. Relationship between age and tablet colour preference. Expected values in brackets. Colour P Age group pink orange white 18–35 36–60 >60 P 26 (16.6) 14 (20.3) 9 (12.2) 40 (42.2) 57 (51.7) 28 (31.0) 32 (39.2) 49 (48.0) 35 (28.8) 49 125 116 98 120 72 290 Consider the 49 subjects who preferred pink tablets (first column of table 6.1). How do we expect these to be distributed between the age groups under the null hypothesis? The totals for the three age groups are 98, 120, 72 so the proportion in row one is 98/290. The total for column one is 49 so we would expect 49 × 98/290 = 16.6 18–35 year olds to prefer pink tablets. The rule for finding the expected value for any cell is to multiply its row total by its column total and divide by the grand total. We can calculate expected values for all cells and add them to the contingency table. Chi-squared (χ 2 ) is then calculated using the following formula. Where O is the observed count and E is the expected count. χ2 = X (O − E)2 E In the case of a contingency table the number of degrees of freedom is the product of (n rows −1) and (n columns −1). Thus in table 6.1 there are 3 rows and 3 columns, therefore there are 4 degrees of freedom. Another way to look at this is to consider the number of independant pieces of information in the body of the contingency table. Given that the row and column totals are fixed, how many counts can we fill in before the remainder are determined. In a 3 × 3 table we can only fill in four counts before the rest of the table is determined hence four degrees of freedom. The probability of a χ 2 value of 11.78 with 4 degrees of freedom is 0.019. So we can reject the null hypothesis. In interpreting an analysis like this one, which gives a significant association, it is helpful to examine the individual contributions to χ 2 from each cell of the table to see which counts differ most from the expected. Values over 1 should be 76 O E 26 40 32 14 57 49 9 28 35 16.6 42.2 39.2 20.3 51.7 48.0 12.2 31.0 28.8 P Contribution to χ 2 5.38 0.12 1.32 1.94 0.54 0.02 0.82 0.30 1.33 11.78 = χ2 examined. In this case there are four cells with values over 1. Row 1, column 1 has a value of 5.38. This indicates that in the 18–35 age group more than expected preferred pink tablets. Similarly, fewer of this age group than expected preferred white (χ 2 =1.32). The 36–60 age group showed the opposite effect, fewer than expected preferred pink (χ 2 =1.94); and amongst the over 60s more than expected preferred white. Note. χ 2 -analysis Must only be carried out on actual counts, not on percentages, mean values or other derived statistics. 6.3 Small expected values Special care should be taken if one or more of the expected values is less than 5. The following precautions are recommended. See the next section for 2 × 2 tables. For tables with df larger than 1, χ 2 can be used if fewer than 20% of the cells have expected frequencies less than 5 and if no cell has an expected frequency of less than 1. If these requirements are not met you must combine rows or columns to increase the expected frequencies. 6.4 2 × 2 Contingency Tables It is common in pharmacy and pharmacology to investigate the effect of a drug by recording whether it produces a particular side effect in two groups of subjects, a drug treated group and a control group. 77 In this instance a 2 × 2 contingency table is employed. Presence of drug Absence of drug P Presence of side effect Absence of side effect P a c b d a+b c+d a+c b+d n χ 2 should be calculated using the following formula. n(|ad − bc| − 12 n)2 (6.1) (a + b)(c + d)(b + d)(a + c) Note: |ad − bc| means ‘the absolute value of’. This formula has the advantage that it contains a correction for continuity (Yates correction) without which 2 × 2 tables can give values of χ 2 which are too large. Check that none of the expected values are less than 5. If they are, Fisher’s exact test must be used (see suitable text book). χ2 = 6.4.1 Example of 2 × 2 analysis Control Aspirin P Gastric Irritation No Gastric Irritation P 5 15 30 12 35 27 20 42 62 Is aspirin associated with increased gastric irritation? The data suggest that this might be the case but is this statistically significant? 62(|60 − 450| − 31)2 35 × 27 × 42 × 20 = 10.07, df = 1, p = 0.0015 χ2 = Thus, aspirin is associated with increased gastric irritation. 6.5 Exercise 1 Calculate χ 2 for the following data on the effectiveness of innoculation against cholera. Does innoculation give protection against cholera? Use MINITAB (see box below). 78 Innoculated Uninnoculated P P Attacked Unattacked 11 21 89 79 100 100 32 168 200 Analysis of Contingency table with M INITAB For a contingency table with three columns and two rows READ data into C1. . . C3. C1 contains data for the first column of the contingency table and so on. READ C1 C2 C3 CHIS C1 C2 C3 M INITAB for Windows gives the calculated χ 2 and it’s probability. On the UNIX machines you can calculate the probability as follows . . . CDF n K1; CHIS d. LET K1=1-K1 PRINT K1 Where n is the value of χ 2 and d is the degrees of freedom. 79 80 Chapter 7 Statistical Tables 7.1 Table 1. Critical values for the Wilcoxon signed-rank test Two-tailed test N 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 5% 0 2 3 5 8 10 13 17 21 25 29 34 40 46 52 One-tailed test 1% N 5% 1% 0 1 3 5 7 9 12 15 19 23 27 32 37 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0 2 3 5 8 10 13 17 21 25 30 35 41 47 53 60 0 1 3 5 7 9 12 15 19 23 27 32 37 43 81 7.2 Table 2a. Critical values for the Mann-Whitney U test. For two-tailed test. 5% significance level. N2 N1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0 0 2 . . . . . . . . . . . . . . . 1 1 3 5 . . . . . . . . . . . . . . 1 2 5 6 8 . . . . . . . . . . . . . 0 2 3 6 8 10 13 . . . . . . . . . . . . 0 2 4 7 10 12 15 17 . . . . . . . . . . . 0 3 4 8 11 14 17 20 23 . . . . . . . . . . 0 3 5 9 13 16 19 23 26 30 . . . . . . . . . 1 4 6 11 14 18 22 26 29 33 37 . . . . . . . . 1 4 7 12 16 20 24 28 33 37 41 45 . . . . . . . 1 5 9 13 17 22 26 31 36 40 45 50 55 . . . . . . 1 5 10 14 19 24 29 34 39 44 49 54 59 64 . . . . . 1 6 11 15 21 26 31 37 42 47 53 59 64 70 75 . . . . 2 6 11 17 22 28 34 39 45 51 57 63 69 75 81 87 . . . 2 7 12 18 24 30 36 42 48 55 61 67 74 80 86 93 99 . . 2 7 13 19 25 32 38 45 52 58 65 72 78 85 92 99 106 113 . 2 8 14 20 27 34 41 48 55 62 69 76 83 90 98 105 112 119 127 82 7.3 N2 N1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Table 2b. Critical values for the Mann-Whitney U test. For two-tailed test. 1% significance level 5 0 . . . . . . . . . . . . . . . 6 0 1 2 . . . . . . . . . . . . . . 7 0 1 3 4 . . . . . . . . . . . . . 8 9 10 11 12 13 14 15 16 17 18 19 20 1 2 4 6 7 . . . . . . . . . . . . 0 1 3 5 7 9 11 . . . . . . . . . . . 0 2 4 6 9 11 13 16 . . . . . . . . . . 0 2 5 7 10 13 16 18 21 . . . . . . . . . 1 3 6 9 12 15 18 21 24 27 . . . . . . . . 1 3 7 10 13 17 20 24 27 31 34 . . . . . . . 1 4 7 11 15 18 22 26 30 34 38 42 . . . . . . 2 5 8 12 16 20 24 29 33 37 42 46 51 . . . . . 2 5 9 13 18 22 27 31 36 41 45 50 55 60 . . . . 2 6 10 15 19 24 29 34 39 44 49 54 60 65 70 . . . 2 6 11 16 21 26 31 37 42 47 53 58 64 70 75 81 . . 0 3 7 12 17 22 28 33 39 45 51 57 63 69 74 81 87 93 . 0 3 8 13 18 24 30 36 42 48 54 60 67 73 79 86 92 99 105 83 7.4 Table 2c. Critical values for the Mann-Whitney U test. For one-tailed test. 5% significance level N2 N1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0 . . . . . . . . . . . . . . . . . 0 1 . . . . . . . . . . . . . . . . 0 1 2 4 . . . . . . . . . . . . . . . 0 2 3 5 7 . . . . . . . . . . . . . . 0 2 4 6 8 11 . . . . . . . . . . . . . 1 3 5 8 10 13 15 . . . . . . . . . . . . 1 4 6 9 12 15 18 21 . . . . . . . . . . . 1 4 7 11 14 17 20 24 27 . . . . . . . . . . 1 5 8 12 63 19 23 27 31 34 . . . . . . . . . 2 5 9 13 17 21 26 30 34 38 42 . . . . . . . . 2 6 10 15 19 24 28 33 37 42 47 51 . . . . . . . 3 7 11 16 21 26 31 36 41 46 51 56 61 . . . . . . 3 7 12 18 23 28 33 39 44 50 55 61 66 72 . . . . . 3 8 14 19 25 30 36 42 48 54 60 65 71 77 83 . . . . 3 9 15 20 26 33 39 45 51 57 64 70 77 83 89 96 . . . 4 9 16 22 28 35 41 48 55 61 68 75 82 88 95 102 109 . . 4 10 17 23 30 37 44 51 58 65 72 80 87 94 101 109 116 123 . 4 11 18 25 32 39 47 54 62 69 77 89 92 100 107 115 123 130 138 84 7.5 N2 N1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Table 2d. Critical values for the Mann-Whitney U test. For one-tailed test. 1% significance level. 5 0 1 . . . . . . . . . . . . . . . 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 2 3 . . . . . . . . . . . . . . 0 1 3 4 6 . . . . . . . . . . . . . 0 2 4 6 7 9 . . . . . . . . . . . . 1 3 5 7 9 11 14 . . . . . . . . . . . 1 3 6 8 11 13 16 19 . . . . . . . . . . 1 4 7 9 12 15 18 22 25 . . . . . . . . . 2 5 8 11 14 17 21 24 28 31 . . . . . . . . 0 2 5 9 12 16 20 23 27 31 35 39 . . . . . . . 0 2 6 10 13 17 22 26 30 34 38 43 47 . . . . . . 0 3 7 11 15 19 24 28 33 37 42 47 51 56 . . . . . 0 3 7 12 16 21 26 31 36 41 46 51 56 61 66 . . . . 0 4 8 13 18 23 28 33 38 44 49 55 60 66 71 77 . . . 0 4 9 14 19 24 30 36 41 47 53 59 65 70 76 82 88 . . 1 4 9 15 20 26 32 38 44 50 56 63 69 75 82 88 94 101 . 1 5 10 16 22 28 34 40 47 53 60 67 73 80 87 93 100 107 114 85 7.6 Table 3. Critical values for Spearman’s rank correlation coefficient Two-tailed test N 4 5 6 7 8 9 10 11 12 5% 1.000 0.886 0.786 0.738 0.700 0.648 0.618 0.587 One-tailed test 1% N 5% 1% 1.000 0.929 0.881 0.833 0.794 0.754 0.727 4 5 6 7 8 9 10 11 12 1.000 0.900 0.827 0.714 0.643 0.600 0.564 0.536 0.503 1.000 0.943 0.893 0.833 0.783 0.745 0.709 0.678 86