Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Generating Random Variables from the Inverse Gaussian and First-Passage-Two-Barrier Distributions Maia Lesosky and Julie Horrocks Department of Mathematics and Statistics University of Guelph email: [email protected] Abstract We investigate a naive method for generating pseudo-random variables from two distributions, the inverse Gaussian and the First-PassageTwo-Barrier distribution. These are the first passage time distributions of a Wiener process with non-zero drift to one and two barriers respectively. The method consists of simulating a path (a realization of the Wiener process) by constructing a step function which jumps by a normally distributed amount at successive time intervals. The time at which the path breaches a barrier (the first passage time) is taken as a realization of the desired random variable. We show that this method is unsatisfactory, at least for some combinations of parameter values and time intervals between jumps, and suggest possible areas for future work. 1 Introduction In this report, we consider a Wiener process, {W (t), t > 0}, with drift µ and volatility σ 2 . The increments, W (t) − W (s), are independent and normally distributed with mean µ(t − s) and variance σ 2 (t − s) for any 0 ≤ s < t. The sample paths of the process are almost surely continuous. The Wiener process, sometimes referred to as Brownian motion, arises as the continuous limit (in a certain sense) of a random walk (see [3]). It has been used recently in financial applications, for instance to model stock prices. Given some constraints on the parameters (see below), the first passage time of this process to a single barrier has an inverse Gaussian (IG) distribution. The distribution of the time that the process first hits one of two barriers has a distribution we call the First-Passage-Two-Barrier (FP2B) distribution. We briefly discuss the distributions in Sections 2 and 3. Then we outline a naive method for generating random variables from these distributions. The performance of this method is tested in Section 5 using chi-square 1 u x0 0 5 10 15 20 25 30 35 time Figure 1: Single Barrier Model goodness-of-fit tests. Finally in Section 6 we give some conclusions and suggestions for future work. 2 Inverse Gaussian Distribution Consider a Wiener process {W (t), t > 0} with positive drift µ and volatility σ 2 that starts at position x0 at time t = 0 and unfolds in the presence of a barrier at u > x0 , as shown in Figure 1. The first passage time, i.e. the time that the process first hits the barrier, has an inverse Gaussian (IG) distribution with density u − x0 −(u−x02−µt)2 2σ t e , f (t; u, µ, σ, x0 ) = √ 2πσ 2 t3 t > 0. (1) The IG density is most often seen with an alternate parametrization given by r −λ(t−ν)2 λ 2ν 2 t f (t; ν, λ) = , t > 0; (2) e 2πt3 see for instance [2]. In this alternate formulation, the distribution has expectation ν and variance ν 3 /λ. The reparametrization from Equation 2 to Equation 1 is outlined in Appendix A. We assume throughout this report that the starting position of the process is 0, that is, x0 = 0. Thus we use a slightly simplified form of the density: −(u−µt)2 u f (t; u, µ, σ) = √ (3) e 2σ2 t , t > 0 2πσ 2 t3 2 Figure 2: Inverse Gaussian Density where u > 0 and µ > 0. Figure 2 shows the IG density plotted for various values of the parameters. The same density is obtained if the drift is negative and the barrier u is less than 0. If the drift and barrier have opposite signs the distribution is improper, in that there is positive probability that the first passage time is infinite. If the drift is zero, then the distribution of the first passage time follows a stable law with index 1/2 (see for instance [3]), in which case the expected time to hitting the barrier is infinite. In this report, we confine our attention to the case of positive drift and positive barrier. The cumulative distribution function (cdf) of the IG distribution is ¶ µ ¶ µ 2uµ −u − tµ −u + tµ 2 √ √ +eσ Φ , t > 0, (4) F (t; u, µ, σ) = Φ σ t σ t where Φ(x) is the standard normal cdf evaluated at x. 3 First-Passage-Two-Barrier Distribution Again consider a Wiener process starting at 0, with non-zero drift µ and volatility σ 2 , that unfolds in the presence of two barriers, u > 0 and −` < 0. The distribution of the time that the process first arrives at one of the barriers has been well studied (see for instance [3]), but has no generally agreed-upon name. Following Horrocks and Thompson ([5]) we refer to it as the First-Passage-Two-Barrier (FP2B) distribution. The density is only expressible in terms of infinite sums. 3 Let g(t, u, `, µ, σ) represent the function µ ¶ µ ¶¾ ∞ ½ −(µt+2`)µ X 1 sk − u sk + u 2 2σ √ e √ √ (sk − u) φ − (sk + u) φ σ t σ t σ t3 k=0 (5) where sk = −(2k + 1)(u + `) and φ(x) is the standard normal density evaluated at x. Then the lower subdensity f−` (t), corresponding to the event that the process hits the lower barrier before the upper barrier and does so in (t, t + dt), equals g(t, u, `, µ, σ) dt. The upper subdensity fu (t), corresponding to the event that the upper barrier is hit before the lower barrier in (t, t + dt), is equal to g(t, `, u, −µ, σ) dt. Note that the upper subdensity can be found by replacing µ with −µ and interchanging u and ` in the expression for the lower subdensity. The subdensities each integrate to something less than one, and the density f (t; u, `, µ, σ) is the sum of the two subdensities, i.e. f (t; u, `, µ, σ) = fu (t) + f−` (t). In biological data analysis it is often convenient to talk about the survivor function, which gives the probability that the event of interest occurs after time t. Here the event of interest is breach of either the upper or the lower barrier. The survivor function F(t) is equal to 1 − F (t) where F (t) is the cdf. The survivor function of the FP2B distribution, F(t; u, `, µ, σ), is given by · µ ¶ µ ¶¸ ∞ ½ X ck + u − µt ck − ` − µt −µck /σ 2 √ √ e Φ −Φ − σ t σ t k=−∞ · µ ¶ µ ¶¸¾ −dk + u − µt −dk − ` − µt µdk /σ 2 √ √ e Φ −Φ (6) σ t σ t where ck = 2k(u + `) and dk = 2k(u + `) + 2u. The probability that the upper barrier is breached before the lower barrier, and that this event occurs after time t is given by the subsurvivor function ½ µ µ ¶¶ ck + u − µt −µck /σ 2 √ Fu (t; u, `, µ, σ) = sign(µ) e Φ sign(µ) − σ t k=−∞ µ µ ¶¶¾ −dk + u − µt µdk /σ 2 √ e Φ sign(µ) (7) σ t ∞ X 4 u x0 l 0 5 10 15 20 25 30 time Figure 3: Wiener process, Two Barriers Similarily the lower subsurvivor function is ½ µ µ ¶¶ −dk − ` − µt µdk /σ 2 √ sign(µ) e Φ sign(µ) − F−` (t; u, `, µ, σ) = σ t k=−∞ µ µ ¶¶¾ ck − ` − µt −µck /σ 2 √ e Φ sign(µ) . (8) σ t ∞ X The presence of sign(µ) is necessary to improve numerical stability. Equations 5 through 8 are from [5]. The probability that the process will reach the upper barrier before the lower barrier is given by 2 1 − e2µ`/σ P (upper) = −2µu/σ2 e − e2µ`/σ2 (9) (see for instance [10]). When the drift is equal to zero, this probability becomes `/(u + `). The probability that the lower barrier is crossed first is one minus the probability that the upper barrier is crossed first. 4 Random Number Generation We generated pseudo-random numbers using a naive method based on the definition of the IG and FP2B distributions as the first passage times of the Wiener process with drift µ and volatility σ 2 to one and two barriers respectively. Since the Wiener process has independent normal increments we can, in theory, simulate it by summing independent normal random 5 u x0 0 1 2 3 4 time Figure 4: Wiener process approximated by a step function variables. To this end let Xi , i = 1, 2, . . . be independent random Pn variables from the N (µ, σ 2 ) distribution, and let S0 = 0 and Sn = i=0 Xi . The sequence {Si ; i = 0, 1, 2, . . .} is a discrete stochastic process that takes a step at each time unit, where the size of the step is random and normally distributed. We can also think of {Si ; i = 0, 1, 2, . . .} as a discretely observed realization of a Wiener process {W (t); t > 0}, with discrete observation times t = 0, 1, 2, . . .. In this sense we use {Si ; i = 0, 1, 2, . . .} to approximate the Wiener process {W (t); t > 0}, as shown in Figure 4. Intuitively, the approximation will improve as the time interval between jumps gets small. We now outline the method in detail for the one-barrier situation, where the first passage time to the barrier at u has the IG distribution. Let N = min{j ≥ 1 : Sj ≤ u, Sj+1 > u}. Clearly the arrival time to the barrier is somewhere in the interval [N, N +1). In this report, we use linear interpolation to give a non-integer arrival time. However other schemes could be considered, such as using either N or N + 1 or a random time in the interval [N, N + 1). A computer program (Appendix B) was written in Fortran 90, compiled using the Compaq f90 compiler and run on SharcNet, a Compaq Alpha ES40 cluster at the University of Guelph. The algorithm is summarized here: 6 1. Generate two uniform random numbers. 2. Use the two uniforms to produce a potential standard normal variate via the ratio of uniforms method. This is essentially a rejection method. If the number is rejected, return to Step 1. 3. Transform the standard normal variate to a normal variate with mean µ and variance σ 2 . 4. Repeat Steps 1-3 while the cumulative sum of the transformed normal variates, Sn , is less than the barrier at u. If Sn > u then continue to Step 5. 5. Estimate the arrival time to the barrier by linear interpolation. This sequence of steps should in theory produce a single realization from the IG distribution. Each step is discussed in more detail below. Generation of random variables from the FP2B distribution is detailed in section 4.4 4.1 Uniform Distribution We used a linear congruential generator to produce uniformly distributed pseudo-random numbers ui on the interval (0, 1). This method uses the recursion formula xi+1 = axi (mod m), i = 0, 1, ..., n where a and m are integer constants chosen to maximize desirable properties such as independence and cycle length. The initial value or seed, x0 , is also an integer. The uniform random numbers are then calculated as ui = xi /m. The specific generator used for this simulation is the one described by Park and Miller [8] as the “minimal standard”. It has m = 231 − 1 and a = 16807. This generator has been well tested for its statistical properties including independence and was deemed sufficient for our purposes. The seed values for this simulation were obtained by calls to the system time clock as shown in appendix B. This method of producing seeds does not allow the user to repeat a given sequence of random numbers. However this feature could easily be modified. Our Fortran program generates 10,000 uniforms in succession and stores them in an array. If these are used up, the program is able to construct another array. 4.2 Normal Distribution To generate standard normal variables, we used the ratio of uniforms method with quadratic bounding curves [7]. The ratio of uniforms is a general 7 method, developed by Kinderman and Monahan [6] and based on the following idea. Suppose we want to generate a random variable X with density f (x). If the points (u, v) are uniformly spread over the region A = {(u, v) : 0 ≤ u ≤ f 1/2 (v/u)}, then the ratio v/u has the desired distribution, as we now show. Let the area of the region A be 1/a. Then the joint density of (U, V ) is g(u, v) = a 0 ≤ u ≤ f 1/2 (v/u) and 0 otherwise. Note that f (x) may be 0 for some values of x, and this may in part determine the region A. Now consider the transformation X = V /U , Y = U . This is a one-to-one transformation with U = Y , V = XY . The Jacobian J is ¯ ¯ ¯ ∂u ∂u ¯ ¯ ¯ ¯ ∂x ∂y ¯ ¯ ¯ ¯ ¯ 0 1 ¯¯ J =¯ = −y ¯= ¯ ∂v ∂v ¯ ¯ y x ¯ ¯ ∂x ∂y ¯ Then the joint density of (X, Y ) is 0 ≤ y ≤ f 1/2 (x), g(x, y) = g(u, v)|J| = ay and the marginal density of X is Z Z f 1/2 (x) f ∗ (x) = f 1/2 (x) g(x, y)dy = ay dy = a/2 f (x) 0 0 Since both f ∗ (x) and f (x) are densities, we must have f ∗ (x) = f (x) and a = 2. In fact, we only need specify f (x) up to a constant. Suppose h(x) = R∞ Kf (x) where h(x) is a non-negative function and −∞ h(x)dx = K < ∞. Let A = {(u, v) : 0 ≤ u ≤ h1/2 (v/u)}, with area 1/a. Then g(u, v) = a for 0 ≤ u ≤ h1/2 (v/u) and g(x, y) = ay for 0 ≤ y ≤ h1/2 (x), and the marginal density of X is Z ∗ f (x) = Z h1/2 (x) g(x, y)dy = 0 h1/2 (x) ay dy = a/2 h(x). 0 But since f ∗ (x) must integrate to 1, we have that a = 2/K, i.e. the area of the region A must be K/2. To implement this idea, we generate points (u, v) uniformly over the region A, which is easily done using rejection methods. We then take the ratio v/u as a realization of a random variable with the required density 8 f (x). Note that there is no need to evaluate K; it is only necessary to generate points uniformly spread over the region A. Suppose that in particular we want to generate from a standard normal 2 distribution with kernel h(x) = e−x for −∞ < x < ∞. We need to generate 2 2 points uniformly overpthe region A = {(u, v) : 0 ≤ u ≤ e−v /(4u ) }, which has boundaries v ≤ ±2u − ln(u), 0 ≤ u ≤ 1. This isp easily done by p rejection, since A is contained in the rectangle 0 ≤ u ≤ 1, − 2/e ≤ v ≤ 2/e. Thus we can use the following algorithm: 1. Generate u ∼ U (0, 1) and w ∼ U (0, 1), independent. p 2. Let v = 2 2/e (w − 1/2). 3. If v 2 ≤ −4u2 ln(u) then return v/u as a standard normal deviate. Otherwise go to 1. 0.00 −0.86 v 0.86 Note that in Step 2, (u, v) is uniformly distributed over the shaded rectangle shown in Figure 5. Points that satisfy the inequality in Step 3 (i.e. points that are accepted) are uniformly distributed over the region A, which is shaded with dots in Figure 5. If the inequality is not satisfied, the point (u, v) is rejected, and we go back to Step 1. 0 1 u Figure 5: Ratio of Uniforms Method The evaluation of ln(u) for many random values u is computationally expensive. The algorithm can be much improved by constructing quadratic 9 0.70 0.75 v 0.80 0.85 boundary curves around A. Figure 6 is a magnified view showing the boundary of region A as a solid line, an outside quadratic bound as a dotted line, and an inside quadratic bound as a dashed line. A quick and cheap assessment of whether (u, v) falls outside A can be made by determining whether (u, v) falls above the dotted line. Similarly a quick and cheap assessment of whether (u, v) falls inside of A can be made by assessing whether (u, v) falls below the dashed line. Only in a very small number of cases will it be necessary to use the expensive assessment in Step 3 above. A very fast algorithm developed by Leva [7] was used in this project. 0.35 0.40 0.45 0.50 u Figure 6: Quadratic Bounds on Region A It has been noted by Hörmann [4] that the use of the ratio of uniforms method in conjunction with the linear congruential method leads to a gap in the support of the distribution, in which no pseudo-random numbers will √ be generated. The probability of this gap is of order 1/ m, which for our linear congruential generator equals 2.156E-05. This gap does not seem to affect the goodness-of-fit tests for the normal distribution (see Section 5.2). Nevertheless, the combination of linear congruential and ratio of uniforms methods should be avoided in the future. We next transformed the standard normal variates into normal variables with mean µ and standard deviation σ. This is easily done since if Z ∼ N (0, 1), then X = σZ + µ ∼ N (µ, σ 2 ). Thus we generate standard normal variables, multiply them by σ and add µ. 10 4.3 Inverse Gaussian Distribution P The transformed normals Xi are then summed until the sum Sn = ni=0 Xi exceeds the barrier at u. The arrival time to the barrier is estimated using linear interpolation, as follows. Let N equal the minimum j such that Sj ≤ u and Sj+1 > u. The line segment connecting the points (j, Sj ) and (j + 1, Sj+1 ) has slope β = Sj+1 − Sj . Let (x, u) be the point of intersection of the line segment and the barrier at u. Then x=j+1− 4.4 (Sj+1 − u) . β First-Passage-Two-Barrier Distribution The algorithm for generating pseudo-random numbers from the FP2B distribution is identical to the algorithm for generating from the IG distribution, except that Step 4 becomes: • Repeat Steps 1-3 while the cumulative sum of transformed normals is less than the upper barrier and greater than the lower barrier. Arrival times to either the upper or lower barrier were obtained by linear interpolation. We recorded both time of crossing and which barrier was breached. 5 Testing of Random Numbers Numbers generated from the uniform, normal and IG distributions were tested for goodness of fit against the theoretical distributions using the chisquare test. For the FP2B distribution, the frequencies of the observed arrival times to the upper and lower barriers were tested separately for goodness of fit against the theoretical subsurvivor functions given in Equations 7 and 8. We again used a chi-square test. To perform the chi-square test, the support of the distribution is divided into I bins, i = 1, 2, . . . , I. The test statistic is given by X2 = I X (Oi − Ei )2 Ei i=1 where Oi is the observed frequency in the ith bin and Ei is the expected frequency. If the null hypothesis is true, i.e. if the observations do come 11 from the distribution with expected frequencies Ei , then the statistic X 2 has asymptotically a χ2 distribution with I − 1 degrees of freedom (df). (See for instance [1].) All statistical testing was done in SAS Version 8.2. The code used to do some of the following tests can be found in Appendix C. 5.1 Uniform Distribution We tested the implementation of the uniform random number generator for correctness using the method described in [8]. This test involves starting the recursion with a specified seed value and checking that the 100th random uniform number generated is equal to a given constant. To test for goodness of fit, we generated 9998 random uniform numbers on the interval (0, 1). The interval (0, 1) was then divided into five bins of equal length. The test produced a statistic of 3.9144 on 4 df and a p-value of 0.4177, indicating no evidence of lack of fit. 5.2 Normal Distribution We generated 1904 pseudo-random numbers from a standard normal distribution, and tested them for goodness-of-fit using the chi-square test with 12 bins. (While we aimed for around 2000 normal variates, some numbers were of course rejected in the ratio of uniforms algorithm, resulting in fewer than 2000 normals. This is an idiosyncracy of the program that could easily be altered.) The test statistic was 11.6518 on 11 df, giving a p-value of 0.3904. Again, there is no evidence of lack of fit. 5.3 Inverse Gaussian with σ 2 =1 We generated samples of various sizes from the IG distribution with σ 2 = 1 and two different combinations of drift and barrier values, as shown in Table 1. There was no limit on the time until the barrier was breached. The calculation of expected values for the goodness of fit test is outlined in Appendix C. An interesting pattern appears in the results shown in Table 1. Looking across rows of the table, we note that for fixed values of the parameters, the p-value decreases as the sample size increases. For sample sizes of 400 or greater, the null hypothesis is rejected, indicating that there is evidence that the generated numbers do not follow the IG distribution. 12 Parameters: Sample size: Chi-Square: DF: Pr>ChiSq: Parameters: Sample size: Chi-Square: DF: Pr>ChiSq: Drift=0.38 100 3.7109 5 0.5002 Drift=0.15 100 9.142 5 0.3699 Barrier=10.0 200 8.5423 5 0.3632 Barrier=5 200 11.680 5 0.0825 400 23.484 5 0.0343 600 66.3329 5 0.0006 Table 1: Goodness of Fit Test for IG distribution with σ 2 = 1 Parameters: Sample size: σ2 : Chi-Square: DF: Pr>ChiSq: Drift=0.15 100 0.0625 3.6997 5 0.4482 Barrier=5.0 100 0.25 6.2134 5 0.1838 100 2.25 8.2538 5 0.1428 100 25.0 24.4358 5 0.0004 Table 2: Goodness of Fit Test for IG distribution with varying σ 2 5.4 Inverse Gaussian with varying σ 2 We generated samples of size 100 from an IG distribution with fixed drift and barrier, but varying volatility, σ 2 . The results, summarized in Table 2 show that as the volatility increases, the p-value decreases. For very large values of the volatility parameter, there is strong evidence that the generated variables do not follow an IG distribution. 5.5 First-Passage-Two-Barrier Distribution We generated samples from the FP2B distribution with σ 2 = 1 using the method described in Section 4. The computer program ran until a barrier was crossed, with no limit on the time. For testing purposes, we isolated the numbers that hit the upper barrier and used a chi-square statistic to quantify goodness of fit with the upper subsurvivor function. More details are given in Appendix C. The required values of the subsurvivor function were calculated by summing terms in Equation 7 until the next term added changed the sum by less than 0.0001. As Table 3 shows, the same trend noticed with the IG numbers is present 13 Parameters: Drift=0.15 Upper=5.0 Lower=-5.0 Parameters: Drift=0.15 Upper=2.0 Lower=-2.0 Sample size: Chi-Square: DF: Pr>ChiSq: Sample size: Chi-Square: DF: Pr>ChiSq: 83 8.9365 5 0.1116 64 25.2060 5 <0.0001 171 6.8382 5 0.2330 140 32.9380 5 <0.0001 338 11.5810 5 0.0410 278 93.2124 5 <0.0001 660 42.2480 5 <0.0001 560 216.9342 5 <0.0001 Table 3: Goodness of Fit Test for FP2B Upper Subsurvivor Function here. As the sample size increases, the p-values become small, giving evidence that the generated numbers do not come from the specified distribution. Even though the sample sizes are not equal in the two rows of the table, there is a suggestion that as the distance between the barriers decreases, the p-value tends to decrease as well. We also tested the arrival times to the lower barrier against the lower subsurvivor function. However for all choices of parameters shown here, the number of paths which hit the lower barrier was very small. Although the null hypothesis was not rejected, the test has low power with small sample sizes. As the test is unreliable, the results are not shown. Other parameter values were tested (not shown), namely (µ=0.38, u=2.59, l=-2.34, σ 2 =1) and (µ=0.38, u=3.21, l=-1.70, σ 2 =1), but in all cases the p-values were < 0.0001 unless the sample size was less than 20. Sample sizes lower than 20 are suspect, as the power of the test to detect a discrepancy from the nominal distribution is low. 5.5.1 Probability of Crossing the Upper Barrier As a further test of the random number generator, we compared the observed proportion of paths which breached the upper barrier before the lower one to the theoretical probability from Equation 9. The observed proportion was calculated as the number of paths which reached the upper barrier divided by the total number of paths in the simulation. All of the observed proportions in Table 4 are from simulations of 10,000 paths with σ 2 = 1. The column labelled Difference is calculated by subtracting the expected value from the observed value. For a sample of n paths, we can easily calculate the variance of the observed proportion of paths that hit the upper barrier. In effect we have n binomial trials, with constant probability of success p = P (upper) as given 14 Parameters Upper=2.0 Lower=-2.0 Drift=0.1 0.2 0.3 0.4 0.5 Upper=3.0 Lower=-3.0 Drift=0.1 0.2 0.3 0.4 0.5 Upper=4.0 Lower=-4.0 Drift=0.1 0.2 0.3 0.4 0.5 Upper=5.0 Lower=-5.0 Drift=0.1 0.2 0.3 0.4 0.5 Observed Expected Difference Standard Deviation 0.6356 0.7328 0.8284 0.8882 0.9308 0.5987 0.6899 0.7685 0.8320 0.8808 0.0369 0.0429 0.0599 0.0562 0.0500 0.0049 0.0046 0.0042 0.0037 0.0032 0.6711 0.8074 0.8941 0.9429 0.9752 0.6456 0.7685 0.8581 0.9168 0.9526 0.0255 0.0389 0.0360 0.0261 0.0226 0.0048 0.0042 0.0035 0.0028 0.0021 0.7087 0.8665 0.9347 0.9731 0.9901 0.6899 0.8320 0.9168 0.9608 0.9820 0.0188 0.0345 0.0179 0.0123 0.0081 0.0046 0.0037 0.0028 0.0019 0.0013 0.7289 0.9021 0.9588 0.9876 0.9933 0.7311 0.8808 0.9525 0.9820 0.9967 -0.0022 0.0213 0.0063 0.0056 -0.0034 0.0044 0.0032 0.0021 0.0013 0.0006 Table 4: Proportion of Upper Barrier Crossings 15 in Equation 9. Then the observed number of paths that hit the upper barrier is a binomial random variable, X. The variance of the observed proportion X/n is then pq/n. The standard deviation of X/n is shown in column 5 of Table 4. Several interesting patterns can be discerned in Table 4. First of all, the observed proportion of paths that hit the upper barrier is almost always farther than two standard deviations away from the expected proportion. Secondly the observed proportion is almost always strictly greater than the expected proportion. Overall, too many paths end at the upper barrier. These facts indicate a systematic lack of fit. For fixed barriers, increasing drift does not seem to have any systematic effect on the difference between observed and expected proportions. However, as the distance of the barriers from the origin increases the difference between the observed and expected values decreases. 6 Conclusions It is well known that for natural populations, as the sample size increases, the p-value from a goodness of fit test tends to decrease. As many applied statisticians will attest, a goodness of fit test is almost guaranteed to reject the null hypothesis for a large enough sample size. This is because a theoretical distribution is only a mathematical model for a natural population, and is never an exact description. However one would hope that the same would not be true for pseudo-random numbers generated by computer. The algorithms that we used to generate uniform and normal random variables are well-used and exhaustively tested, and goodness of fit tests showed no evidence that the generated numbers did not come from the nominal distributions, even for sample sizes as large as 10,000. For the IG and FP2B distributions, however, there was strong evidence of lack of fit, even for quite modest sample sizes. The method was tested independently in R and the results were confirmed, indicating that the fault does not lie with our Fortran program. In this report we have shown results for interpolated crossing times, but using the last time before the crossing, or the first time after the crossing gave essentially the same results. Thus it seems that the method of generating IG and FP2B random variables as the approximate time that a cumulative sum of normals crosses a barrier is unsatisfactory, at least for values of the parameters and time intervals investigated here. Why does the method fail? As we noted in Table 4, for the FP2B dis- 16 u x0 −l 0.0 0.5 1.0 1.5 2.0 2.5 time Figure 7: Wiener process approximated by a step function tribution, more paths than expected end at the upper barrier, suggesting that we are somehow missing breaches of the lower barrier. A possible explanation is that there is a high probability that, between two observations times, a path breaches the lower barrier and then immediately rises above the barrier, as shown in Figure 7. Thus this breach of the lower barrier is unobserved. Later we observe the path crossing the upper barrier, and falsely record the time of this event as the first crossing time. An obvious solution is to decrease the time interval between steps so that the step function more closely approximates a Wiener process. A similar phenomenon could explain the lack of fit for the IG distribution. Figure 8 shows the theoretical density as a smooth curve, plotted over a histogram of 10000 random numbers generated from the IG distribution with µ = 0.15, u = 5 and σ = 1. Clearly the theoretical probability of hitting the barrier before time 15 is much higher than the observed proportion of paths which do so. Conversely the theoretical probability of hitting the barrier at a very late time is less than that observed. (The histogram has a much longer tail than is shown in the figure; three out of 10,000 observations were greater than 500.) This seems to indicate that the first passage time of some of the paths is being missed, and we instead are observing the second (or later) hitting time. Recall that in all our simulations, the process takes one step per unit time interval. Again it appears that shorter intervals between steps should ameliorate the systematic lack of fit. 17 0.04 0.03 0.02 0.00 0.01 Density 0 50 100 150 200 t Figure 8: Histogram of generated IG random variates versus theoretical density; µ = 0.15, u = 5.0, σ = 1 There is some evidence that the fit improves as the distance from the origin to the barrier increases. Of course, the process must take more steps to reach a relatively distant barrier. Thus there is evidence that the method improves as the average number of steps needed to reach the barrier increases. Decreasing the time interval between steps would also result in an increased number of steps needed to reach the barrier. Thus we have further support for the notion that decreasing the times between steps would improve the performance of this method of random number generation. Faster and more reliable methods for generating random variables from the IG distribution exist (see for instance [9]). While there are other methods of generating realizations of a Wiener process (see for instance [10]), we know of no other way to generate variates from the FP2B distribution, except as the time that a simulated Wiener process breaches one of two barriers. Further work will focus on testing different time intervals between steps and developing rules of thumb for the time interval necessary to achieve satisfactory fit with the nominal distributions. More importantly, better and more efficient methods for generating random numbers from the FP2B distribution are needed. 18 Acknowledgements This project was funded by the University of Guelph and the Natural Sciences and Engineering Research Council. We would like to thank the Department of Mathematics and Statistics at the University of Guelph for providing space, administrative and computer support, and for making opportunities like this available to students. 19 References [1] Bickel, P.J. and Doksum, K.A. Mathematical Statistics. Holden-Day, Oakland, California, 1977. [2] Chhikara, R.S. and Folks, J.L. The Inverse Gaussian Distribution– Theory, Methodology and Application. New York. 1989. [3] Feller, W. An Introduction to Probability Theory and its Applications, Volume 1. Wiley, New York. 1986. [4] Hörmann, W. A Note on the Quality of Random Variates Generated by the Ratio of Uniforms Method. ACM Transactions on Modeling and Computer Simulation. Vol.4, No. 1, 1994. pp. 96-106. [5] Horrocks, J. and Thompson, M.E. Modelling Event Times with Multiple Outcomes Using a Wiener Process with Drift. Accepted Lifetime Data Analysis. [6] Kinderman, A.J. and Monahan, J.F. Computer Generation of Random Variables Using the Ratio of Uniform Deviates. ACM Transactions on Mathematical Software, Vol 3 No. 3 September 1977, pp. 257-260. [7] Leva, J.L. A Fast Normal Random Number Generator. ACM Transactions on Mathematical Software, Vol. 18, No. 4, December 1992. pp. 449-453. [8] Park, S.K. and Miller, K.W. Random Number Generators: Good Ones are Hard to Find. Communications of the ACM. Vol. 31, No. 10, 1988. pp. 1192-1201. [9] Seshadri, V. The Inverse Gaussian Distribution. Oxford, Clarendon Press, 1993. [10] Taylor, H.M., Karlin, S., An Introduction to Stochastic Modeling, 3rd Edition. California, Academic Press, 1998. 20 A Parametrization of the IG Density (u−x0 )2 σ2 In this appendix, we show that if λ = and ν = u−x0 µ u − x0 −(u−x02−µt)2 2σ t f (t; u, µ, σ, x0 ) = √ e 2πσ 2 t3 is equivalent to r f (t; ν, λ) = Start with 11 and let δ = 1 ν r f (t; δ, ρ) = 1 (t− 1 )2 δ 1 −ρ 2t/δ 2 e 2πρt3 1 f (t; δ, ρ) = p 2πρt3 µ u−x0 and ρ = (11) then which simplifies to then, substituting δ = (10) −λ(t−ν)2 λ 2ν 2 t e . 2πt3 1 λ and ρ = then e σ2 (u−x0 )2 − (12) (1−δt)2 2ρt (13) gives 2 − 1 f (t; u, µ, σ, x0 ) = q 2 σ 3 2π (u−x 2t 0) e µt (1− u−x ) 0 σ2 t 2 (u−x0 )2 (14) which simplifies to u − x0 −(u−x02−µt)2 2σ t e f (t; u, µ, σ, x0 ) = √ 2πσ 2 t3 as required. 21 (15) B Fortran Code The simulation code in its entirety is reproduced here for the double barrier model. The code for the single barrier model involves very little modification, just a removal of the lines that stop the run when the lower barrier is breached. PROGRAM MAIN IMPLICIT NONE !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! This program simulates a Wiener process. ! Author: Maia Lesosky, 2003,Univeristy of Guelph ! ! This program will generate maxN variates from the FP2B ! distribution. ! This program requires 1 parameter file to run, called para.txt. ! The parameter file must have the following format: ! &nlist ! drift=0.15 ! topbarr=5.0 ! botbarr=-5.0 ! step=5 ! maxN=100 ! / ! where the parameters are defined as follows: ! !DRIFT :: drift paramter for the Wiener process !TOPBARR :: value of the top barrier u>0 !BOTBARR :: value of the bottom barrier l<0 !STEP :: value of variance of Wiener process !MAXN :: total number of barrier crossings wanted ! !This program can write to any number of files. Currently it is set !up to write to a single file called arr.txt that has the run number, !the interpolated arrival time and an indicator variable which has the value !1 if the upper barrier was crossed and 0 if the lower barrier was crossed. ! !A list of all the variables and definitions follows: ! ! unfm1,unfm2 :: The two uniform random numbers used to generate the normal 22 ! random number ! S() :: Matrix that holds the generated uniform random numbers ! nrml :: Unadjusted (standard normal) random normal variate ! nrml_adj :: Adjusted (N(drift,step)) random variate ! arr_tme :: Interpolated arrival time ! sum_nrml :: running sum of adjusted normal variates ! p,t,a,b,r1,r2 :: parameters for the ratio of uniforms method ! i,j,k,l,m,n ::counters ! maxI !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! NAMELIST/nlist/ drift,topbarr,botbarr,step,maxN REAL :: drift,topbarr,botbarr,step REAL :: unfm1,unfm2 REAL ::p = 0.449871, t = -0.386595, a = 0.19600, b = 0.25472, r1 = 0.27597, r2 = 0.27846 REAL*8 :: S(10000) REAL*8 :: nrml,nrml_adj,arr_tme,iseed,sum_nrml REAL*8 :: u,v,x,y,q,rm INTEGER :: maxI=1000,maxR=10000,maxS=10000,maxN INTEGER :: rindex=1,scount=1,flag=1,numN=0,numAt,numAb INTEGER :: i,j,k,l,m,n,numtop INTEGER :: Date_time(8) INTEGER :: dte1,dte2,dte3,tme1,tme2,tme3 INTEGER*8 :: mm,aa,seed CHARACTER :: filename*15,ext*4,file5*15,file6*15 CHARACTER :: file1*15,file2*15,file3*15,file4*15 CHARACTER :: file7*15 CHARACTER(len=12) real_clock(3) aa=16807 mm=2147483647 file1=’rndunf’//’.txt’ file2=’rndnrl’//’.txt’ file4=’para2’//’.txt’ file5=’arr’//’.txt’ 23 & file6=’bot’//’.txt’ file7=’path’//’.txt’ OPEN(unit=20,file=file4,status=’old’) READ(20,NML=nlist) CLOSE(20) step=sqrt(step) OPEN(unit=21,file=file5,status=’old’) WRITE(21,*)’upper=’,topbarr,’lower=’,botbarr WRITE(21,*)’drift=’,drift,’std. dev.=’,step numAt=0 numAb=0 DO i=1,maxN numN=0 flag=1 sum_nrml=0 DO j=1,maxR S(j)=0 END DO ! ! ! ! ! Next section produces an array of seeds used by random number generator by making calls to the system time/date clock tme1=hour,tme2=min,tme3=sec dte1=month,dte2=day,dte3=year CALL DATE_AND_TIME (real_clock(1),real_clock(2), & real_clock(3), date_time) CALL IDATE (dte1,dte2,dte3) tme1=date_time(5) tme2=date_time(6) tme3=date_time(7) 24 scount = 1 seed = tme1*tme2*tme3+dte1+dte2+dte3 DO j=1, maxS seed = aa*seed seed = mod(seed,mm) rm=real(mm) S(j)=(seed/rm) END DO !This section checks and produces random normals DO m=1,maxI unfm1=S(rindex) !obtaining 2 random numbers to check rindex=rindex+1 unfm2=S(rindex) rindex=rindex+1 IF(unfm1==0.OR.unfm2==0.OR.rindex>=maxS)THEN !Reset uniform number array CALL DATE_AND_TIME (real_clock(1),real_clock(2), & real_clock(3), date_time) CALL IDATE (dte1,dte2,dte3) tme1=date_time(5) tme2=date_time(6) tme3=date_time(7) scount = 1 seed = tme1*tme2*tme3+dte1+dte2+dte3 !The seed is composed of the hour multiplied by the minute and !second then added to the sum of the day, month and year. DO j=1, maxS seed = aa*seed seed = mod(seed,mm) rm=real(mm) 25 S(j)=(seed/rm) END DO rindex=1 CYCLE END IF unfm2 = 1.7156 * (unfm2 -0.5 ) ! Evaluate the quadratic form x = unfm1 - p y = ABS(unfm2) - t q = x**2 + y*(a*y - b*x) ! Accept P if inside inner ellipse IF (q < r1) THEN flag=1 ! Reject P if outside outer ellipse ELSE IF(q > r2) THEN flag=0 ! Reject P if outside acceptance region ELSE IF (unfm2**2 < -4.0*LOG(unfm1)*unfm1**2) THEN flag=1 END IF IF(flag==0)THEN !If the uniforms have been rejected return to top !of loop and get two new random uniforms to try CYCLE ELSE !the uniforms have been accepted 26 numN=numN+1 nrml = unfm2/unfm1 !random normal with mean 0 and variance 1 !adjust the normal 0, 1 to be normal drift,step nrml_adj=step*nrml+drift sum_nrml=sum_nrml+nrml_adj ! sum of independent random normals !this section stops the iterations if the barrier has been reached IF(sum_nrml>=topbarr.OR.sum_nrml<=botbarr)THEN numAt=numAt+1 IF(sum_nrml>=topbarr)THEN arr_tme=(topbarr-sum_nrml+nrml_adj*numN)/nrml_adj WRITE(21,*)numAt,arr_tme,1 EXIT ELSE arr_tme=(botbarr-sum_nrml+nrml_adj*numN)/nrml_adj WRITE(21,*)numAt,arr_tme,0 EXIT END IF ELSE CONTINUE END IF END IF END DO END DO CLOSE(21) END PROGRAM MAIN 27 C SAS Testing Code The code reproduced here is a sample of the SAS code used to test the goodness of fit of the generated random numbers. 1. Code for testing the uniform random numbers data unfmbin; set unfmtest; if 0 le rnd le .2 if .2 le rnd le .4 if .4 le rnd le .6 if .6 le rnd le .8 if .8 le rnd le 1.0 run; then then then then then xd=1; xd=2; xd=3; xd=4; xd=5; proc freq data=unfmbin; tables xd/chisq; run; By default, SAS assumes that the probability of falling in each bin is a constant, p, and calculates the expected values as np where n is the total sample size. 2. Code for testing Inverse Gaussian Distribution for parameters drift=0.15, upper barrier=5.0. data topbin; set topbarr; if 0 le rnd le 20 then xd=1; if 20 le rnd le 30 then xd=2; if 30 le rnd le 40 then xd=3; if 40 le rnd le 50 then xd=4; if 50 le rnd le 60 then xd=5; if rnd ge 60 then xd=6; run; proc freq data=topbin; tables xd/ testp=( 0.4924, 0.1569, 0.0973, 0.0644, 0.0447, 0.1444); run; 28 Since the probability of falling in each bin is not constant, the user must supply a list of probabilities in the testp statement. These probabilities are calculated as follows. Suppose that X has the IG distribution, with cdf F (x). First we divide the positive real line into bins: (x0 , x1 ), (x1 , x2 ), . . . , (xn−1 , xn ), where x0 = 0 and xn = ∞. Let pi equal the probability of falling in the ith bin, in other words, pi = P (X ∈ (xi−1 , xi )) for i = 1, 2, . . . , n. Then pi is found as F (xi ) − F (xi−1 ) for i = 1, 2, . . . , n. 3. For the FP2B distribution, we tested the observed frequencies of observations that hit the upper barrier in various time intervals against the conditional upper subsurvivor function Fu |D = 1 where D is an indicator variable that takes the value 1 if the path crosses the upper barrier before the lower one. The required probabilities were calculated as follows. Suppose that X has the FP2B distribution, with upper subsurvivor function Fu (x). First we divide the positive real line into bins: (x0 , x1 ), (x1 , x2 ), . . . , (xn−1 , xn ), where x0 = 0 and xn = ∞. Let pi = P (X ∈ (xi−1 , xi )|D = 1) for i = 1, 2, . . . , n. Then pi = (Fu (xi−1 ) − Fu (xi ))/P (upper). Note that Fu (∞) = 0 and Fu (0) = P (upper), as given in equation 9. Sample Calculation Calculation of the probabilities associated with the upper barrier, for the parameters µ = 0.15, u = 10.0, and l = 10.0, with bins from 0-10, 10-20, 20-30, 30-40, 40-50, 50-60, 60+. (a) Calculate P (upper), ) ½ ¾ ( 1 − e2µ` 1 − e2(0.15)(10.0) = 0.9526 = e−2µu − e2µ` e−2(0.15)(10.0) − e2(0.15)(10.0) (b) Using Equation 7 the following values are calculated : F(10)= 0.9462, F(20)=0.8572, F(30)=0.7134, F(40)=0.5744, F(50)=0.4560, F(60)=0.3619. (c) The differences di = Fu (xi−1 ) − Fu (xi ) are calculated: p1 = P (upper)-F(10)=0.00636, p2 = F(10)-F(20)=0.0891, p3 = F(20)-F(30)=0.1437, p4 = F(30)-F(40)=0.1390, p5 = F(40)F(50)=0.1184, p6 = F(50)-F(60)=0.0942, p7 = F(60)=0.3619 (d) Finally pi = di /P (upper). The testp statement corresponding to this example is testp=(0.0067, 0.0935, 0.1509, 0.1459, 0.1243, 0.0988, 0.3798); 29