Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
A GENERAL METHOD FOR PRODUCING RANDOM V ARIABtES IN A COMPUTER George Marsaglia Boeing Scientific Research Laboratories Seattle, Washington SUMMARY based on representing the density of X as a mixture of "easy" and "difficult" densities, the latter being called for infrequently. See Ref. 2 for a general discussion and Refs. 1 and 3 for applications to normal and exponential variables. These programs are very fast, but at the same time they are rather complicated and require hundreds of stored constants. In this paper, we will try to develop a general method that is simpler, but still very fast. The procedure will be explained by way of three examples (a beta variate, the normal distribution, and a chi-square variate), from which the suitability of the method and its general applicability can be inferred. These points are discussed in the final section. Throughout this article, we will use f(x) to represent the density of V 1 + V 2 + V 3, with the V's independent, uniform over [0,1]. This density is a piecewise parabola, once differentiable, or, if you prefer, a parabolic spline curve: Many random variables can be approximated quite closely by c(M + V 1 + V 2 + V 3), where c is constant, M is a discrete random variable, and the V's are uniform random variables. Such a representation appears attractive as a method for generating variates in a computer, since M + V 1 + V 2 + V3 can be quickly and simply generated. A typical application of this idea will have M taking from 4 to 7 values; the required X will be produced in the form c(M + V 1 + V 2 + V 3) perhaps 95-99% of the time, and occasionally by the rejection technique, to make the resulting distribution come out right. This paper describes the method and gives examples of how to generate beta, normal, and chi-square variates. INTRODUCTION Given a sequence of independent uniform random variables V 1 , V 2 , • • • , we are concerned with methods for representing an arbitrary variate X as a function of the V's. Such representations lead to programs for generating X's in a computer. There are any number of ways to represent X as a function of the V's; the problem is to find those which lead to fast, accurate, easy to code programs which occupy little space in the computer. Programs which are very fast and accurate may be n~: f(x) =l.~x' - 0:S;x:S;1 1.5 (x-I)' 1:S;x:S;2 - 1.5(x-l)' + 1.5 (X_2)2 2:S; x:S; 3 x < 0 orx > 3. A BETA VARIATE We want a method for generating a variate X with a beta density, say b(x) 105x4(l_X)'2, 0 < X < 1. Our aim is to generate X in the form c (M + V 1 + = 169 From the collection of the Computer History Museum (www.computerhistory.org) PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1966 170 V 2 + V s) most of the time, occasionally generating X by the rejection technique in order that the resulting mixture be correct. A good choice for c in this case is .1. We will generate X in the form .1 (M + VI + V 2 + Vs) with as high a frequency as possi.1 (0 + VI + V 2 + ble. That is, we will put X Vs) with probability Po, put X .1 (1 + VI + V 2 + Vs) with probability PI, put X .1 (2 + VI + ... + Vs) with probability P2, . .. , and put X .1 (7 + VI + V 2 + V 3 ) with probability P7' It turns out that we must put Po = 0, since b(x) varies as x 4 , while f(x) varies as X2, at the origin. We can, however, fit b(x) very closely with a mixture of the densities of .1(1 + VI + V 2 + Vs), .1(2 + VI + V 2 + Vs), ... , .1(7 + VI + V 2 + Vs). We may formulate the problem of finding the best set of p's as follows: = = = = + P2 + Ps + P4 + P5 + P6 + P7 subject to the condition that P'i ~ 0 and PI[10j(10x-1)] + P2[10f(10x-2)] + ... + P7 [10f(10x-7)] :::; 105x4(1-x)'2 for 0 :::; x :::; 1 ( 1 ) This optimization problem is similar to those of linear programming-in fact, if we specify condition ( 1) for a suitably fine mesh of x values, we have an ordinary linear programming problem. We find that we can get PI + . . . + P7 very close to 1, in fact LPi .9915, and still maintain condition (1). Thus we write To generate a beta variate X, density 105x4 (l-X)2, o< X < 1, 1. with probability P5 + V 2 + Vs] 2. 4. + VI = .2155 put X = .1[6 + VI with probability P-1 = .1978 put X = .1[4 + VI + V 2 + Vs] with probability Ps = .1297 put X = .1 [3 + VI with probability P6 + 3. = .2369 put X = .1 [5 + V2 V3] + V 2 + Vs] with probability P7 = .1284 put X = .1 [7 + VI + V 2 + Vs] 6. with probability P2 .0633 put X .1[2 + VI + V 2 + Vs] 7. with probability PI = .0199 put X = .1[1 + VI + V 2 + Vs] 8. with probability .0085 generate X with density hex) by the rejection technique outlined in Fig. 1. 5. = Referring to Fig. 1, we see that the residual funcb(x) - Lp.i[10f(10x-i)] has a peak on tion g(x) the right end which makes the rejection technique too inefficient; this difficulty may be overcome in several ways-for example, by adding one more step to the procedure, as described in the figure. This will add another step to the outline: = = 7 105x4 (1-x)2 = = Choose PlJP2, ... , P7 so as to maximize PI = .1978, .2369, .2155, and .1284, and LPi .9915. The residual function g(x) .0085h(x) is drawn in Fig. 1. We may generate X with density h (x) by the rejection technique. We summarize with this outline: = ~ pi,[10f(10x-i)] + .0085h(x) 7a. with probability .0044, put X + V 2 + Vs), i=I for 0 :::; x :::; 1, = .05 (17 + VI and the probability in step 8 will be changed from .0085 to .0041 . where the p's are, in order, .0199, .0633, .1297, .072~------------------------~7----------------------------------------------~~---' g(x)= 105x 4(1.x)2- ~ p. [10f(10x-i)] ;=1' Pl =.0199, P2 = .0633, P3 = .1297, P4 = .1978, P5 = .2369, P6 = .2155, P7 = .1284 To generate a random variable X with density g(x)/.0085, generate pairs x=U 1, y= .072U 2 until y < g(x), .0085/.072, or 12%. = then put X x. The efficiency of this rejection technique is This may be raised to .0041/.014, or 29%, by subtracting another term from g(x). The resulting curve is dotted. y = .014U 2 until y < 9 (x) - In this case, generate pairs x=U 1, .0044 [20f(20x-17)], then put X= x . .014 o 1.0 Figure 1. Method for generating a variate from the residual portion of the beta distribution, by the rejection technique. From the collection of the Computer History Museum (www.computerhistory.org) A GENERAL METHOD FOR PRODUCING RANDOM VARIABLES THE NOR1VIAL DISTRIBUTION We want a method for generating a standard normal variate X, density (27T )-.5e-. 5X2 . We temporarily discard the tails, JxJ > 3.5, and divide the interval -3.5 < x < 3.5 into 10 parts. We will generate X in the form .7(M + VI + V 2 + Vs), where M takes values-5,-4, -3, -2, -1, 0, 1, 2 with probabilities Pl,P2,' .. ,Ps' We choose the p's so as to maximize the frequency of the representation X .7(M + VI + V 2 + Vs), as follows: Choose PlJP2,' .. ,Ps so as to maximize + P2 + ... + Ps subject to the condition that Pi > 0 PI 10 10 Pi[-f(-x+5)] 7 7 + ... + and 10 10 P2[-f(-x+4)] 7 7 + = 7 for -3.5 = ~ = x :s; 3.5 (2) = = = The solution is PI Ps .0092, P2 P7 .0517, Ps P6 .1576, and P4 P5 .2767, with PI + ... + Ps .9904. Thus we write = (27T)-·5e-· 5X2 = = = s = 1. with probability .2767, put X .7(Vl + V 2 + V s '- 1) .7(V] + V'!. 2. with probability .2767, put X + Vs '- 2) 3. with probability .1576, put X .7(Vl + V:!. + V 3 ·- 0) 4. with probability .1576, put X .7(V1 + V 2 + Vs - 3) 5. with probability .0517, put X .7(V1 + V'!. + V3 - 4) .7(U1 + V 2 6. with probability .0517, put X + Vs + 1) 7. with probability .0092, put X .7(V1 + V'!. + Vs -5) 8. with probability .0092, put X .7 (VI + V 2 + Vs + 2) 9. with probability .0091347418, generate (x,y) uniformly from the rectangle of Fig. 2 until y < g(x), then put X x. 10. with probability .0004652582, generate pairs 2V 1 , - 1, Y V 2 until y < 3.5(12.25 x 21n JXJ-·5 then put { (12.25 - 2InJx\)·5 if x < 0 X - ) -(12.25 - 2In\xi)·5 if x > 0 = 10 10 Ps[-f( -x-2)] ~ (27T )-.5e-· 5X2 7 171 = -t 10 10 A CHI-SQUARE VARIATE 7 We will develop a procedure for generating a x~ variate X, density x S e-· 5X /96, x > 0, along the same lines as the two examples above. We choose the interval .4 ~ x ~ 20.4 for fitting our mixture, dividing it into 10 parts. We will generate X in the form 2(M + VI + V'!. + V 3 ) , where M takes values .2, 1.2, 2.2, ... ,7.2 with probabilities PI, P2,· .. , Ps. The best choice of the p's comes from solving this problem: ~ Pi[-f(-x-i+6)] i=l + .0091347418h(x) + 7 .0004652582t(x) where hex) is the residual density on -3.5 ~ x ~ 3.5, and t(x) is the tail, i.e., the density of X, conditioned by JxJ > 3.5. We generate X with density h by the rejection technique (Fig. 2), and from the tail by the method described in Ref. 4. These are steps 9 and lOin the following outline: To generate a standard normal variate X, density (27T) -. 5 e-· 5X2, Choose Pb P2,' .. , Ps so as to maximize PI + P2 + ... + Ps To generate a variate X with density g(x)/.0091347418, generate pairs _ _ _ _ x=7~1-.5), y=.0038~2 until y<g(x~e~t X=x_._ __ .00382 PI =PS=0092, P2=P7=·0517, P3=P6=.1576, .00913 IClency = .02674' or 34%. Eff' . P4 = Ps = .2767 -3.5 o Figure 2. Method for generating a variate from the residual portion of the normal distribution. From the collection of the Computer History Museum (www.computerhistory.org) 3.5 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1966 172 a g(x) =x 3e - .Sx 96 - }; Pi [.5f (.5x-i + .8) ] i=1 Pl=·1608, P2=.2313, P3=·2128, p",=.1571, Ps=·1013, P6=.0599, P7=.0318, Pa=.0182 .00505 r------:::~-..., To generate a variate X with density g (xl /.0178758526, choose (x,yl uniformly from the U-shaped region until y <g(x), then put X = x. The efficiency is 46%. Generate (x,y) uniformly over the region bounded by heavy lines by putting I (3Ul, .00505U2) with probability .39355760 (x,y) = (3 + 14.5UJ, .0006U 2) with probability .22600338 (17.5 + 2.9U 1, .00505U2) with probability .38043902 .0006 o 17.5 3.0 20.4 Figure 3. Method for generating a variate from the residual portion of the chi-square-8 distribution. subject to the condition that Pi > 0 and P1[.5f(.5x-.2)] + p2[.5f(.5x-1.2) + ... + Ps[.5f(.5x-7.2)] ::;; x 3 e-· 5X /96 for 0 < x < 2004 The solution PI, P2,' .. , Ps of this problem is given in the outline below and in Fig. 3. The sum of the p's is .9732. We generate X from the residual density by the rejection technique as described in Fig. 3. We generate X from the tail, i.e., conditioned by IXI > 2004, by transforming the tail to the unit interval and using the rejection technique (see Fig. 4). All of the steps combine to form this outline: To generate a chi-square-8 variate X, density x 3 e-· 5X /96, x > 0, 1. with probability P2 = .2313 put X = 2( 1.2 VI + V 2 + V 3 ) 1.0~---------------- + 2. with probability P3 VI + V 2 + V 3 ) 3. with probability PI VI + V 2 + V 3 ) 4. with probability P4 VI + V 2 + V 3 ) 5. with probability P5 VI + V 2 + V 3 ) 6. with probability Pa VI + V 2 + V 3 ) 7. with probability P7 VI + V 2 + V 3 ) = .2128 put X = 2(2.2 + = .1608 put X = 2 (.2 + = .1571 put X = 2(3.2 + = .1013 put X = 2(4.2 + = .0599 put X = 2 (5.2 + =.0318 put X = 2 (6.2 + 8. with probability Ps = .0182 put X = 2(7.2 + VI + V 2 + V 3 ) 9. with probability .0178758526 generate X from the residual density drawn in Fig. 3, by the rejection technique. B To generate a X~ variate X, conditioned by X from the quadrilateral OABC until y The efficiency is 70%. < > 20.4, choose (x,y) uniformly x-5 e 10.2-10.2/x, then put X=20.4/x. Generate {x,y} uniformly from OABC by putting _{{.7-.7M + m, .1-lM} with probability 1/4 (x,y)- (.7+ .3M, .1 + .9M-m) with probability 3/4, 0.1 o 0.7 C Figure 4. Method for generating a variate from the tail of the chi-square-8 distribution. From the collection of the Computer History Museum (www.computerhistory.org) A GENERAL METHOD FOR PRODUCING RANDOM VARIABLES 10. with probability .0089241474 generate X from the tail of the X2s distribution, by the rejection technique described in Fig. 4. GENERAL REMARKS The examples above suggest the following general procedure for dealing with a density q (x). The only requirement is that q be close to the x-axis at its extremities. An interval containing most of the density is chosen, say a < x < b, then divided into n equal parts; In is usually a good choice. If h = (b-a)/10, then this linear programming-type problem is solved: choose P1' . .. ,Ps so as to maximize P1 + P2 + ... + Ps, subject to the condition that Pi. > 0 and p,j ( x;) + P2f (X-:-h)+ ... + P,I( X-:7h) ~ hq(x) = Then X may be generated by putting X a + heM + U 1 + U 2 + U 3 ) , where M takes values 0,1,2, ... , 7 with probabilities Pl . .. , Ps, or by choosing X from the residual density by the rejection technique, or from the tail, in a manner suggested by the above examples. The sum of the p's will usually be quite close to I-it was .9915, .9904, and .9732 in the three examples, and thus the resulting programs will be very fast. Few constants are needed, and the programs should be easy to code; they vary little from one density to the next, only the constants and some 173 details from the residues or tails changing. In fact, the fast parts of the programs-generating c(M + U 1 + U 2 + U 3 ) , are so consistent from one density to the next that a basic program for this part of the outline can be written in machine language, with the constants inserted for the particular density under consideration. The slow parts of the program-the residual density and tail can be handled by FORTRAN, or some such convenient language, subroutines. The procedure for a normal variate outlined above is almost as fast as the super program in Ref. 3, yet it is much simpler and requires very little computer space. REFERENCES 1. M. D. MacLaren, G.eMarsaglia and T. A. Bray, "A Fast Procedure for Generating Exponential Variables," Communications of the Association for Computing Machinery, vol. 7, no. 5 (1964). 2. G. Marsaglia, "Expressing a Random Variable in Terms of Uniform Random Variables," Annals of Mathematical Statistics, vol. 32, pp. 894-98 (1961). 3. - - , M. D. MacLaren and T. A. Bray, "A Fast Procedure for Generating Normal Variables," Communications of the Association for Computing Machinery, vol. 7, no. 1 (1964). 4. - - , "Generating a Variable from the Tail of the Normal Distribution," Technometrics, vol. 6, no. 1, pp. 101-2 (1964). From the collection of the Computer History Museum (www.computerhistory.org) From the collection of the Computer History Museum (www.computerhistory.org)