Download The role of randomization in clinical trials

STATISTICS I N MEDICINE, VOL. 1, 345-352 (1982) THE ROLE OF RANDOMIZATION IN CLINICAL TRIALS PETER ARMITAGE Dt.partment of Bromaihemarics, Pusry S l r t w , Uniwrsilj of Oxford, Oxford OX1 2 J 2 , England SUMMARY Random assignment of treatments is an essential feature of experimental design in general and clinical trials in particular. It provides broad comparability of treatment groups and validates the use of statistical methods for the analysis of results. Various devices are available for improving the balance of prognostic factors across treatment groups. Several recent initiatives to diminish the role of randomization are seen as being potentially misleading. Randomization is entirely compatible with medical ethics in circumstances when the treatment of choice is not clearly identified. K E Y WORDS Austin Bradford Hill Data-dependent allocation Historical controls Permuted blocks Prognostic variables Randomization Medical ethics 1. INTRODUCTION Clinical trials come in all shapes and sizes, but if they have one single necessary attribute, a sine qua non, it is surely the element of randomization which enters into the assignment of treatment to individual patients. Randomization, as a basic principle of experimental design, was developed by R. A. Fisher in the 1920s, although Stigler’ has noted the important contribution made by the philosopher C . S. Peirce in the design of experiments in psychology, The successful implementation of randomized trials in medicine, in the 1940s, is largely due to the advocacy and example of Sir Austin Bradford Hill. Many of Hill’s expository papers, together with reports of the early trials which he and others conducted for the (British) Medical Research Council, are to be found in Reference 3 (see also p. 369 of this issue). Of course, the realization that the evaluation of therapeutic and prophylactic measures requires carefully controlled studies has a much longer h i ~ t o r y There .~ are a number of fascinating forerunners of randomization, in which investigators have sought to impose a deliberate, rather than a haphazard, system of treatment allocation. The idea of assigning two treatments to alternate patients has an obvious appeal. It was used by Fibiger5 in a trial of therapeutic serum against diphtheria. It was advocated also by Pearson6 for the assessment of typhoid immunization. This example is particularly interesting, in view of Karl Pearson’s apparent lack of interest in experimental design later in his career. Pearson had been asked to interpret non-experimental data on results of various forms of immunization used in the British Army. Not surprisingly, he found this a difficult task, and wrote: If further experimental inoculations were made . . . . the greatest care ought to be taken to get homogeneous material, that is, men of like caution, subjected to the same 0277-67 15/82/04034548$01 .OO 0 1982 by John Wiley & Sons, Ltd. Received 25 April 1982 346 PETER ARMITAGE environment. Assuming that the inoculation is not more than a temporary inconvenience, it would seem to be possible to call for volunteers, but while keeping a register of all men who volunteered, only to inoculate every second volunteer. In this way any spurious effect really resulting from a correlation between immunity and caution would be got rid of. Another tantalizing approach was to form two or more ‘comparable’ groups of patients and then to allocate each group to one treatment by a random act such as the toss of a coin. van Helmont’, a medicinal chemist, in a challenge to the Schoolmen who advocated a purely theoretical rather than an empirical approach to therapeutics, wrote: Let us take out of the hospitals, out of the Camps, or from elsewhere, 200, or 500 poor People, that have Fevers, Pleurisies, &c. Let us divide them into halfes, let us cast lots, that one half of them may fall to my share, and the others to yours; . . . we shall see how many funerals both of us shall have: But let the reward of the contention or wager, be 300 florens, deposited on both sides. Amberson, McMahan and Pinner8 followed this procedure in a trial of sodium gold thiosulphate in the treatment of pulmonary tuberculosis. Twenty-four patients were divided into two groups of twelve, members of the two groups being ‘individually matched’. The active treatment and a placebo were assigned to the two groups ‘by the flip of a coin’. The procedure advocated by van Helmont and adopted by Amberson et al. suffers from two related defects. First, there is no guarantee that the two groups, however carefully matched, d o not differ substantially in some important characteristics which were ignored in the matching. Secondly, there is no way of measuring the relevant random error, since we cannot tell by how much the responses in the two groups might have differed if the treatments had been identical. In modern jargon, there is inadequate ‘replication’. These problems would have been avoided if the design had permitted several independent acts of random allocation rather than a single one. 2. THE BENEFITS OF RANDOMIZATION The precise way in which random assignment is carried out will depend on the broad design of the trial. In many trials, treatments are compared ‘between subjects’, with each subject receiving one of the rival treatments, the assignment being made by an independent random choice for each subject. In crossover trials, each patient receives two or more treatments on different occasions, the order of administration being assigned at random for each subject. Other variants are discussed in Section 3. All these schemes lead to the following desirable consequences, none of which are likely to be fulfilled without randomization: (i) The treatments are compared under broadly similar circumstances. In a between-subject trial, for example, the patients allotted to each treatment group will have similar distributions of prognostic factors. Of course, for any given baseline variable the distributions in different treatment groups will not be exactly the same, unless some deliberate act of balancing has been performed. Moreover, if a large number of baseline variables are studied, one or more may exhibit a marked lack of balance, purely by chance. But it is unlikely that the baseline variable that best predicts therapeutic response will be seriously unbalanced, and unlikely that the distributions of response will vary widely from group to group unless they are really affected by choice of treatment. (ii) Random assignment permits the use of probability theory to express the extent to which any difference in response between treatment groups is likely to be due to chance. The italicized word RANDOMIZATION IN CLINICAL TRIALS 347 ‘unlikely’ in the last paragraph can therefore be given a strictly quantitative interpretation. The probability theory underlying many of the most familiar statistical methods, such as t tests, requires certain assumptions (such as normality of distributions of responses). But these are not essential. The mere act of randomization is enough to support the use ofdistribution-free methods, which in practice are likely to give much the same results as those requiring distributional assumptions. (iii) Random assignment permits, although it does not ensure, the various devices for masking the identity of treatments, including the possible use of a placebo, which are often essential for an unbiased assessment of efficacy. It is difficult to see how these important procedures could be introduced if treatments were to be assigned in a deterministic, non-random fashion. It is essential that assignment should be made after entry of a patient into the trial, so that the decision whether or not to enter a patient is uninfluenced by a knowledge of the treatment to be used. It follows that assignments to be made in the future should not be made known until they are needed; open lists, for example, should be avoided. One well-used approach is to keep each assignment in a sealed envelope bearing the serial number of the patient on the outside. In many multicentre trials the assignment is made by telephone from the co-ordinating centre. In some drug trials the medicaments needed for each patient are prepared in advance by a pharmacist, so that each pack carries merely the patient’s serial number. Points (i) and (ii) are closely related, and are in no way peculiar to clinical experimentation. In his first extended treatment of experimental design, Fisher9 emphasized the role of randomization in providing a proper estimate of random error, and the consequent validity of the significance tests to be applied--essentially point (ii) above. Hill tended to emphasize (i). He also’O stressed the objectivity of randomization: . . . having used a random allocation, the sternest critic is unable to say when we eventually dash into print that quite probably the groups were differentially biased through our predilections or through our stupidity. 3. OTHER DEVICES The simple randomization schemes envisaged above can be performed by the construction of an allocation list before the start of the trial, using random sampling numbers. In many trials, variants of these simple methods are introduced. 3.1 Permuted blocks Although simple randomization is likely to provide satisfactory balance between treatment groups in the distribution of important prognostic factors, imbalances will occur from time to time, and the investigator may wish to reduce the play of chance by ensuring a high degree of similarity between the groups. The device known now as ‘permuted blocks’ was, I believe, first described by Hill. Within each of a number of ‘strata’, or subgroups defined by prognostic variables, the allocation is such that the numbers allotted to different treatments are equalized within each ‘block’ of a certain size. For exampIe, within two treatments, it might be arranged that each block of eight patients in any one stratum contains four allocations to one treatment and four to the other, the particular permutation being entirely random. This device mimics the use of randomized blocks in agricultural experimentation, the agricultural ‘blocks’ being analogous to the clinical strata, rather than to the blocks which are permuted. There is the difference that in agriculture the blocks are of fixed size, whereas in clinical trials the strata are initially of unknown size: the balancing therefore has to be done in a sequential manner. 348 PETER ARMITAGE Three points should be noted: (i) It is pointless to have separate lists for different strata unless these are balanced by permuted blocks or some equivalent device. (ii) If permuted-block allocation is used, the comparison between treatments will be ‘within strata’ and therefore more precise than would otherwise be the case. The statistical analysis should take this increased precision into account. (iii) Even if permuted blocks are not used, the extra precision referred to in (ii) can be largely achieved by the sort of analysis mentioned in (ii)-allowing for the effects of the relevant prognostic variables. It is, in fact, a matter of some debate whether stratification by design, using permuted blocks, has any appreciable advantage over stratification in the analysis without use of permuted blocks. The near-equality of numbers achieved by permuted blocks is certainly advantageous, but the advantage is likely to be marginal except in quite small trials, and may be outweighed by the nuisance of keeping separate randomization lists and the consequently increased risk of errors of assignment. In multicentre trials, one natural system of strata is that of the centres themselves, and there may be a special case for using permuted blocks within each centre so that no centre ends the trial with highly disproportionate numbers on different treatments: a diplomatic rather than a statistical reason. 3.2 Data-dependent allocation In recent years a number of proposals have been made for the use of dynamic systems of allocation, in which the assignment to be made for any one patient depends in some way on the previous course of events. A distinction should be made between (i) schemes aiming to provide a better balance of prognostic factors, and (ii) those designed to ensure that more patients are assigned to the more effective treatments. (i) Balance of prognostic factors This is, of course, the aim of permuted blocks, but this method can be criticized for being insufficiently random. If the block size is constant and known, the last assignment in a block must be determined by the earlier assignments, and in many blocks the determinacy will extend to earlier assignments. Efron’ introduced an additional random element by allowing the probabilities of allocation to fluctuate adaptively, so that an under-represented treatment had a higher chance of being chosen at any stage. There are many variants of these so-called ‘biased-coin’ schemes, but they seem to have little advantage over permuted blocks if treatments are adequately masked. When there are many prognostic variables to be balanced, permuted blocks and biased-coin methods are inconvenient because they require different randomization schemes for each of a large number of combinations of baseline factors. Taves’ suggested an approach called ‘minimization’, whereby the treatment to be received by a particular patient is chosen so as to minimize some index of discrepancy between the characteristics of the treatment groups. Variants of this approach, introducing the sort of randomness characteristic of biased-coin designs, have been described by Pocock and Simon,14 Begg and I g l e ~ i c z ’and ~ Atkinson,16 among others. The two latter papers use methods which are theoretically complex but could be implemented by an appropriate use of a microcomputer. (ii) Concentration on the most eflectiue treatments If the response to treatment of an individual patient becomes known with only a short time-lag after the start of treatment, the cumulative results can be updated so that the apparently more effective treatments can be identified. Many authors have argued that, ethically, higher RANDOMIZATION IN CLINICAL TRIALS 349 proportions of patients should be placed on the apparently better treatments than on those which appear to be inferior. There are very many ways in which this adaptive allocation could be carried out. For binary responses (successifailure), Zelen” suggested the ‘play-the-winner’ rule, whereby treatment is changed each time a failure occurs. Many authors have studied this problem from the point of view of decision theory, the object being to minimize the number of patients, in the trial and perhaps in a larger group to be treated in future, who receive an inferior treatment. An optimal solution is likely to involve a gradual shift from 50 : 50 allocation between two treatments at the outset, to an overwhelming preponderance on the apparently better treatment at some later stage. I do not believe that schemes of this sort have been at all widely used. Although ethically attractive, they are in a sense statistically inefficient since widely discrepant group sizes are less efficient for the estimation of differences between groups than are equal group sizes. Perhaps more important, the characteristics ofpatients entered into a trial are likely to fluctuate during the intake period, particularly if allocation proportions are changing. It will then be very difficult to carry out a valid statistical analysis. Finally, the ethical argument is far from straightforward: is it ethical to place 1 patient in 20 on a clearly inferior treatment? Some of these issues are discussed further by Simon. l 8 3.3 Group allocation We have assumed so far that a random assignment is made either for an individual patient or for a particular course of treatment to be given to the patient. Occasionally, there will be a special advantage in arranging that a group of patients receive the same treatment. If medical care is delivered in stressful circumstances, as perhaps in a casualty department, it may be impracticable to arrange for individual randomization, but quite feasible to use the same regime for a fixed period of time during which many patients will be treated. Again, it may be politic to ensure that all subjects in a group receive the same treatment. In general practice trials, each practitioner may wish to have uniformity of treatment within his group of patients; features of medical care in hospitals, such as nursing routine, may have to be applied uniformly within a ward; in a dental caries trial the same toothpaste may have to be used by all members of the household. The essential point to remember here is that the units which are randomly assigned to treatments are now the groups, rather than the individual patients, and (as noted in connection with the trial by Anderson et al. in Section 1) considerations of replication require that the number of groups should not be too small, even when the groups each contain a large number of patients. The statistical analysis of studies of this sort is discussed by Simon.” 4. ARGUMENTS AGAINST RANDOMIZATION Although the case for randomization was presented forcefully and persuasively by Hill and others in the 1940s, 1950s and 1960s, the argument still rumbles on. Certainly the practice ofclinical trials is much less firmly established in some countries than in others, and no doubt the ethical issues (discussed in Section 5 ) present themselves in different lights when viewed against different traditions of medical practice. Some doubts about randomization have been revived during the last decade, and one or two different strands in the debate need to be distinguished. Gehan and Freireich,20writing particularly about cancer trials, argue for a greater reliance on historical controls. They are concerned partly with the element of artificiality introduced into medical practice in a controlled trial. They also argue that a large set of historical controls can improve precision: if 50 patients are available for testing a new treatment against a widely-used standard, a comparison of all 50 on the new treatment against an effectively infinite number on the 350 PETER ARMITAGE standard will have less random error than a comparison of 25 againt 25. They admit that retrospective comparisons may introduce bias, but argue that disparities in baseline characteristics can be allowed for by appropriate statistical techniques. I believe that this view seriously underestimates the danger of relying on historical controls. Adjustments for baseline differences may well allow properly for discrepancies in the chosen variables, but they provide no safeguard against possible disparities in other respects. Not only may the comparisons be biased, but there is no way of measuring the extent of the bias. A special case can perhaps be made for non-randomized designs in Phase I1 studies, particularly in the early investigation of chemotherapeutic agents against cancer. These studies are necessarily small, since the aim is to select for further study only the small proportion of agents which show initially promising results. Since the sampling error is necessarily large, it may be worthwhile to reduce this by using historical rather than simultaneous controls, even at the risk of introducing bias. It must be remembered that Phase I1 trials are screening procedures, rather than comparative studies with the authority of Phase 111 trials. A different approach has been advocated, particularly by workers in medical computing: the use of large databases recording the baseline characteristics, treatment received and responses observed for large numbers of patients.’ If a new treatment is to be evaluated, the proposal is that it should be used on a group of patients each of whom would be matched for relevant characteristics with a patient who previously received a standard treatment, and whose data are on file. The dangers of this approach have been expounded by Dambrosia and EllenbergZ2and B ~ a rAll . ~the~ reservations previously expressed about historical controls apply here, and there is additional concern about the difficulty of maintaining uniform and reliable standards of data recording in data collected routinely from several sources over a long period of time. Aspden, Jackson and WhitehouseZ4advocate the use of mathematical models describing the transition of cancer patients from one clinical state to another, so that treatments can be assessed by comparing the proportions of patients who, at various times, are currently in specific states. Underlying their approach is a reliance on historical comparisons between non-randomized groups, and there seems no reason to believe that this is any more well-founded than in the other situations described here. 5. ETHICAL ISSUES ‘The first step in . . . a trial is to decide precisely what it is hoped to prove, and secondly to consider whether these aims can be ethically fulfilled. It need hardly be said that the latter consideration is paramount and must never, on any scientific grounds whatever, be lost sight of. If a treatment cannot ethically be withheld then clearly no controlled trial can be instituted.’ The principles expressed in these words by Hill’ have been reiterated since by many writers, and there can scarcely be a clinical trial conducted today in which the participants do not thoroughly satisfy themselves about the ethical propriety of the study. Of course, ethical judgements are subjective, and it is not uncommon, in the planning of a multicentre trial, to find disagreement among the investigators on ethical issues. If several treatments are to be compared a particular investigator may be willing to use some but not all treatments, different selections being preferred by different participants. Perceptions of precisely what randomized comparisons would be ethical are likely to vary from one country to another, and will certainly change with the passage in time as more information becomes available. Hilllo quotes the following passage from an anonymous editorial in the British Medical Journal (‘A comment that expresses what I feel and could not myself, I am sure, have put more clearly’): RANDOMIZATION IN CLINICAL TRIALS 351 In treating patients with improved remedies we are, whether we like it or not, experimenting on human beings, and a good experiment well reported may be more ethical and entail less shirking of duty than a poor one.’ He remarks elsewhere’’ that ‘It may well be unethical. . . . not to institute a proper trial.’ Again, he argues” that ‘a trial should be begun at the earliest opportunity, before there is inconclusive though suggestive evidence of the value of treatment. Not infrequently, however, clinical workers publish favourable results on three or four cases and conclude their article by suggesting that this is the method of choice, or that what is now required is a trial on an adequate scale. They do not seem to realize that by their very publication they have vastly increased the difficulties of that trial or, indeed, made it impossible.’ For this reason, T. C. ChalmersZ5has vigorously argued in favour of randomization of the first patient, a view with which I agree except perhaps for some of the early Phase I1 trials referred to in Section 4. If a randomized trial has been started, and the responses of individual patients are analysed as they accumulate, it may become more and more obvious that one treatment is better than another. The investigators may then become convinced that continued randomization is unethical. It is impossible to lay down rules by which such decisions should be arrived at. Many considerations will be relevant: more than one response variable; side effects as well as therapeutic responses; long-term as well as short-term benefit; ease of administration and perhaps costs. A formal sequential analysis of the r e s ~ l t s is ~ ~likely ~ ~ ’to be useful in allowing for the effect of repeated analysis of the data, but should be regarded as a guideline rather than an overriding stopping-rule. It has sometimes been suggested that randomization of treatment for a given patient is justified only if the doctor’s views about the preferable treatment for that patient are exactly balanced-a situation which of course would never arise in practice. On this view, a single observation for one patient would tip the scales one way or the other, and further randomization would be impossible. This is, perhaps, the extreme ‘individual’ view in the distinction drawn by Lellouch and Schwartz” between ‘individual’ and ‘collective’ethics. In practice there will be a wide range of situations in which the doctor will feel quite justified in giving any of a number of treatments, because the information about all the relevant issues is so scanty, and it will require a good deal of evidence to make him feel that the treatment of choice is clearly identified. The doctor, in fact, will adopt an attitude of collective ethics, permitting the collection of reliable data for the benefit to future patients, provided that there is no clear indication that the interests of his present patients are damaged. Patients will normally be entered into a randomized trial only after their informed consent has been sought and obtained. The extent to which this practice is a legal requirement, and the precise nature of the informed consent varies from one country to another. The numbers of patients available for a trial will therefore be depleted by those patients unwilling to take part, and the relative efficacy of treatments may differ between the consenters and the non-consenters. In an effort to diminish these disadvantages, Ze1enz9has suggested an alternative approach for a trial in which a new treatment, N, is to be compared with a standard, S. The patients are randomized into two groups: I, who will receive S, with no request for consent; and 11, who will be asked whether they are prepared to receive N. In Group I1 the consenters receive N and the rest receive S. A perfectly valid comparison is possible between Groups I and 11, since they have been formed by random assignment. A difference between Groups I and I1 may safely be ascribed to the difference between N and S, but if the proportion of consenters is low the treatment effect may be so diluted by the non-consenters as to be undetectable. The treatment effect amongst the consenters can be estimated fairly, although with impaired efficiency if the proportion of consenters is low. It is too early to judge whether this device will be widely used in practice. 352 PETER ARMITAGE REFERENCES 1. Stigler, S. M. ‘Peirce, Charles Sanders’, in Kruskal, W. H. and Tanur, J. M., (eds), International Encyclopedia ofStatistics, Vol. 2, Free Press, New York, 1978, pp. 698-702. 2. Peirce, C. S. and Jastrow, J. ‘On small differences of sensation’. National Academy o/’Sciences Memoirs, 3 (l), 75-83 (1 884). 3. Hill, A. B. Statistical Methods in Clinicaland Preventive Medicine. Livingstone, Edinburgh and London, 1962. 4. Bull, J. P. ‘The historical development ofclinical therapeutic trials’, JournalofChronic Diseases, 10,218248 (1959). 5. Fibiger, J. ‘Om Serumbehandlung af Difteri’, Hospitalstidende, 6, 309-325 and 337-350 (1898). 6. Pearson, K. ‘Report on certain enteric fever inoculation statistics’, British Medical Journal, 2, 1243-1246 (1904). 7. van Helmont, J. B. Uriatrike or Physik Rejked (translated by J. Chandler), Lodowick Loyd, London, 1662, quoted on p. 27 of Debus, A. G. The Chemical Dream of’the Renaissance. Heffer, Cambridge, 1968. 8. Amberson, J. B. Jr., McMahon, B. I. and Pinner, M. ‘A clinical trial of sanocrysin in pulmonary tuberculosis’, American Review of Tuberculosis, 24, 401 -435 (1931). 9. Fisher, R. A. ‘The arrangement of field experiments’, Journal of’ the Ministry of Agriculture of‘Great Britain, 33, 503-5 13 (1 926). 10. Hill, A. B. ‘The clinical trial’, New, England Journal of Medicine, 247, 113-1 19 (1952). 11. Hill, A. B. ‘The clinical trial’, British Medical Bulletin, 7, 278-282 (1951). 12. Efron, B. ‘Forcing a sequential experiment to be balanced’, Biometrika, 58, 4 0 3 4 1 7 (1971). 13. Taves, D. R. ‘Minimization: a new method of assigning patients to treatment and control groups’, Clinical Pharmacology and Therapeutics, 15, 4 4 3 4 5 3 (1974). 14. Pocock, S. J. and Simon, R. ‘Sequential treatment assignment with balancing for prognostic factors in the controlled clinical trial’, Biometrics, 31, 103-1 15 (1975). 15. Begg, C. B. and Iglewicz, B. ‘A treatment allocation procedure for clinical trials’, Biometrics, 36, 81-90 (1980). 16. Atkinson, A. C. ‘Optimum biased coin designs for sequential clinical trials with prognostic factors’, Biometrika, 69, 61 -67 (1982). 17. Zelen, M. Play the winner rule and the controlled clinical trial’, Journal of the American Statistical Association, 64, 131-146 (1969). 18. Simon, R. ‘Adaptive treatment assignment methods and clinical trials’. Biometrics, 33, 743-749 (1977). 19. Simon, R. ‘Composite randomization designs for clinical trials’, Biometrics, 37, 723-731 (1981). 20. Gehan, E. A. and Freireich, E. J. “on-randomized controls in cancer clinical trials’, New England Journal of Medicine, 290, 198-203 (1974). 21. Starmer, C. F., Lee, K. L., Harrell, F. E. and Rosati, R. A. ‘On the complexity of investigating chronic illness’, Biometrics, 36, 333-335 (1980). 22. Dambrosia, J. M. and Ellenberg, J. H. ‘Statistical considerations for a medical data base’, Biometrics, 36, 323-332 (1980). 23. Byar, D. P. ‘Why data bases should not replace randomized clinical trials’, Biometrics, 36, 337-342 (1980). 24. Aspden, P., Jackson, R. R. P. and Whitehouse, J. M. A. ‘A systems approach to the evaluation ofclinical trials in a specialist oncology centre’, in Coblentz, A. M. and Walker, J. R., (eds), Systems Science in Health Care, Taylor and Francis, London, 1977, pp. 145-152. 25. Chalmers, T. C. ‘Randomization and coronary artery surgery’, Annals of Thoracic Surgery, 14,323-327 (1972). 26. Armitage, P. Sequential Medical Trials, 2nd edn. Blackwell, Oxford, 1975. 27. McPherson, K . ‘Sequential analysis of clinical trials’, in Johnson, F. N. and Johnson, S. (eds),Clinical Trials, Blackwell, Oxford, 1977, pp. 108-128. 28. Lellouch, J. and Schwartz, D. ‘L’essai therapeutique: ethique individuelle ou ethique collective?’, Revue de I’Institut International de Statistique, 39, 127-136 (1971). 29. Zelen, M. ’A new design for randomized clinical trials’, New England Journal of Medicine, 300, 12421245 (1979).

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download The role of randomization in clinical trials