Download Generating Random Variables from the Inverse Gaussian and First

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Vincent's theorem wikipedia , lookup

History of network traffic models wikipedia , lookup

Addition wikipedia , lookup

Karhunen–Loève theorem wikipedia , lookup

Student's t-distribution wikipedia , lookup

Law of large numbers wikipedia , lookup

Multimodal distribution wikipedia , lookup

Transcript
Generating Random Variables from the Inverse Gaussian
and First-Passage-Two-Barrier Distributions
Maia Lesosky and Julie Horrocks
Department of Mathematics and Statistics
University of Guelph
email: [email protected]
Abstract
We investigate a naive method for generating pseudo-random variables from two distributions, the inverse Gaussian and the First-PassageTwo-Barrier distribution. These are the first passage time distributions
of a Wiener process with non-zero drift to one and two barriers respectively. The method consists of simulating a path (a realization of the
Wiener process) by constructing a step function which jumps by a
normally distributed amount at successive time intervals. The time at
which the path breaches a barrier (the first passage time) is taken as a
realization of the desired random variable. We show that this method
is unsatisfactory, at least for some combinations of parameter values
and time intervals between jumps, and suggest possible areas for future
work.
1
Introduction
In this report, we consider a Wiener process, {W (t), t > 0}, with drift µ and
volatility σ 2 . The increments, W (t) − W (s), are independent and normally
distributed with mean µ(t − s) and variance σ 2 (t − s) for any 0 ≤ s < t.
The sample paths of the process are almost surely continuous. The Wiener
process, sometimes referred to as Brownian motion, arises as the continuous
limit (in a certain sense) of a random walk (see [3]). It has been used recently
in financial applications, for instance to model stock prices.
Given some constraints on the parameters (see below), the first passage
time of this process to a single barrier has an inverse Gaussian (IG) distribution. The distribution of the time that the process first hits one of two
barriers has a distribution we call the First-Passage-Two-Barrier (FP2B)
distribution.
We briefly discuss the distributions in Sections 2 and 3. Then we outline
a naive method for generating random variables from these distributions.
The performance of this method is tested in Section 5 using chi-square
1
u
x0
0
5
10
15
20
25
30
35
time
Figure 1: Single Barrier Model
goodness-of-fit tests. Finally in Section 6 we give some conclusions and
suggestions for future work.
2
Inverse Gaussian Distribution
Consider a Wiener process {W (t), t > 0} with positive drift µ and volatility
σ 2 that starts at position x0 at time t = 0 and unfolds in the presence of
a barrier at u > x0 , as shown in Figure 1. The first passage time, i.e. the
time that the process first hits the barrier, has an inverse Gaussian (IG)
distribution with density
u − x0 −(u−x02−µt)2
2σ t
e
,
f (t; u, µ, σ, x0 ) = √
2πσ 2 t3
t > 0.
(1)
The IG density is most often seen with an alternate parametrization
given by
r
−λ(t−ν)2
λ
2ν 2 t
f (t; ν, λ) =
, t > 0;
(2)
e
2πt3
see for instance [2]. In this alternate formulation, the distribution has expectation ν and variance ν 3 /λ. The reparametrization from Equation 2 to
Equation 1 is outlined in Appendix A.
We assume throughout this report that the starting position of the process is 0, that is, x0 = 0. Thus we use a slightly simplified form of the
density:
−(u−µt)2
u
f (t; u, µ, σ) = √
(3)
e 2σ2 t , t > 0
2πσ 2 t3
2
Figure 2: Inverse Gaussian Density
where u > 0 and µ > 0. Figure 2 shows the IG density plotted for various
values of the parameters.
The same density is obtained if the drift is negative and the barrier u is
less than 0. If the drift and barrier have opposite signs the distribution is
improper, in that there is positive probability that the first passage time is
infinite. If the drift is zero, then the distribution of the first passage time
follows a stable law with index 1/2 (see for instance [3]), in which case the
expected time to hitting the barrier is infinite. In this report, we confine
our attention to the case of positive drift and positive barrier.
The cumulative distribution function (cdf) of the IG distribution is
¶
µ
¶
µ
2uµ
−u − tµ
−u + tµ
2
√
√
+eσ Φ
, t > 0,
(4)
F (t; u, µ, σ) = Φ
σ t
σ t
where Φ(x) is the standard normal cdf evaluated at x.
3
First-Passage-Two-Barrier Distribution
Again consider a Wiener process starting at 0, with non-zero drift µ and
volatility σ 2 , that unfolds in the presence of two barriers, u > 0 and −` < 0.
The distribution of the time that the process first arrives at one of the
barriers has been well studied (see for instance [3]), but has no generally
agreed-upon name. Following Horrocks and Thompson ([5]) we refer to it
as the First-Passage-Two-Barrier (FP2B) distribution. The density is only
expressible in terms of infinite sums.
3
Let g(t, u, `, µ, σ) represent the function
µ
¶
µ
¶¾
∞ ½
−(µt+2`)µ X
1
sk − u
sk + u
2
2σ
√ e
√
√
(sk − u) φ
− (sk + u) φ
σ t
σ t
σ t3
k=0
(5)
where sk = −(2k + 1)(u + `) and φ(x) is the standard normal density evaluated at x. Then the lower subdensity f−` (t), corresponding to the event
that the process hits the lower barrier before the upper barrier and does
so in (t, t + dt), equals g(t, u, `, µ, σ) dt. The upper subdensity fu (t), corresponding to the event that the upper barrier is hit before the lower barrier in
(t, t + dt), is equal to g(t, `, u, −µ, σ) dt. Note that the upper subdensity can
be found by replacing µ with −µ and interchanging u and ` in the expression
for the lower subdensity. The subdensities each integrate to something less
than one, and the density f (t; u, `, µ, σ) is the sum of the two subdensities,
i.e. f (t; u, `, µ, σ) = fu (t) + f−` (t).
In biological data analysis it is often convenient to talk about the survivor
function, which gives the probability that the event of interest occurs after
time t. Here the event of interest is breach of either the upper or the lower
barrier. The survivor function F(t) is equal to 1 − F (t) where F (t) is the
cdf. The survivor function of the FP2B distribution, F(t; u, `, µ, σ), is given
by
· µ
¶
µ
¶¸
∞ ½
X
ck + u − µt
ck − ` − µt
−µck /σ 2
√
√
e
Φ
−Φ
−
σ t
σ t
k=−∞
· µ
¶
µ
¶¸¾
−dk + u − µt
−dk − ` − µt
µdk /σ 2
√
√
e
Φ
−Φ
(6)
σ t
σ t
where ck = 2k(u + `) and dk = 2k(u + `) + 2u. The probability that the
upper barrier is breached before the lower barrier, and that this event occurs
after time t is given by the subsurvivor function
½
µ
µ
¶¶
ck + u − µt
−µck /σ 2
√
Fu (t; u, `, µ, σ) =
sign(µ) e
Φ sign(µ)
−
σ t
k=−∞
µ
µ
¶¶¾
−dk + u − µt
µdk /σ 2
√
e
Φ sign(µ)
(7)
σ t
∞
X
4
u
x0
l
0
5
10
15
20
25
30
time
Figure 3: Wiener process, Two Barriers
Similarily the lower subsurvivor function is
½
µ
µ
¶¶
−dk − ` − µt
µdk /σ 2
√
sign(µ) e
Φ sign(µ)
−
F−` (t; u, `, µ, σ) =
σ t
k=−∞
µ
µ
¶¶¾
ck − ` − µt
−µck /σ 2
√
e
Φ sign(µ)
. (8)
σ t
∞
X
The presence of sign(µ) is necessary to improve numerical stability. Equations 5 through 8 are from [5].
The probability that the process will reach the upper barrier before the
lower barrier is given by
2
1 − e2µ`/σ
P (upper) = −2µu/σ2
e
− e2µ`/σ2
(9)
(see for instance [10]). When the drift is equal to zero, this probability
becomes `/(u + `). The probability that the lower barrier is crossed first is
one minus the probability that the upper barrier is crossed first.
4
Random Number Generation
We generated pseudo-random numbers using a naive method based on the
definition of the IG and FP2B distributions as the first passage times of
the Wiener process with drift µ and volatility σ 2 to one and two barriers
respectively. Since the Wiener process has independent normal increments
we can, in theory, simulate it by summing independent normal random
5
u
x0
0
1
2
3
4
time
Figure 4: Wiener process approximated by a step function
variables. To this end let Xi , i = 1, 2, . . . be independent random
Pn variables
from the N (µ, σ 2 ) distribution, and let S0 = 0 and Sn =
i=0 Xi . The
sequence {Si ; i = 0, 1, 2, . . .} is a discrete stochastic process that takes a
step at each time unit, where the size of the step is random and normally
distributed. We can also think of {Si ; i = 0, 1, 2, . . .} as a discretely observed
realization of a Wiener process {W (t); t > 0}, with discrete observation
times t = 0, 1, 2, . . .. In this sense we use {Si ; i = 0, 1, 2, . . .} to approximate
the Wiener process {W (t); t > 0}, as shown in Figure 4. Intuitively, the
approximation will improve as the time interval between jumps gets small.
We now outline the method in detail for the one-barrier situation, where
the first passage time to the barrier at u has the IG distribution. Let
N = min{j ≥ 1 : Sj ≤ u, Sj+1 > u}.
Clearly the arrival time to the barrier is somewhere in the interval [N, N +1).
In this report, we use linear interpolation to give a non-integer arrival time.
However other schemes could be considered, such as using either N or N + 1
or a random time in the interval [N, N + 1).
A computer program (Appendix B) was written in Fortran 90, compiled
using the Compaq f90 compiler and run on SharcNet, a Compaq Alpha ES40
cluster at the University of Guelph. The algorithm is summarized here:
6
1. Generate two uniform random numbers.
2. Use the two uniforms to produce a potential standard normal variate via the ratio of uniforms method. This is essentially a rejection
method. If the number is rejected, return to Step 1.
3. Transform the standard normal variate to a normal variate with mean
µ and variance σ 2 .
4. Repeat Steps 1-3 while the cumulative sum of the transformed normal
variates, Sn , is less than the barrier at u. If Sn > u then continue to
Step 5.
5. Estimate the arrival time to the barrier by linear interpolation.
This sequence of steps should in theory produce a single realization from the
IG distribution. Each step is discussed in more detail below. Generation of
random variables from the FP2B distribution is detailed in section 4.4
4.1
Uniform Distribution
We used a linear congruential generator to produce uniformly distributed
pseudo-random numbers ui on the interval (0, 1). This method uses the recursion formula xi+1 = axi (mod m), i = 0, 1, ..., n where a and m are integer
constants chosen to maximize desirable properties such as independence and
cycle length. The initial value or seed, x0 , is also an integer. The uniform
random numbers are then calculated as ui = xi /m. The specific generator
used for this simulation is the one described by Park and Miller [8] as the
“minimal standard”. It has m = 231 − 1 and a = 16807. This generator
has been well tested for its statistical properties including independence and
was deemed sufficient for our purposes. The seed values for this simulation
were obtained by calls to the system time clock as shown in appendix B.
This method of producing seeds does not allow the user to repeat a given
sequence of random numbers. However this feature could easily be modified.
Our Fortran program generates 10,000 uniforms in succession and stores
them in an array. If these are used up, the program is able to construct
another array.
4.2
Normal Distribution
To generate standard normal variables, we used the ratio of uniforms method
with quadratic bounding curves [7]. The ratio of uniforms is a general
7
method, developed by Kinderman and Monahan [6] and based on the following idea. Suppose we want to generate a random variable X with
density f (x). If the points (u, v) are uniformly spread over the region
A = {(u, v) : 0 ≤ u ≤ f 1/2 (v/u)}, then the ratio v/u has the desired
distribution, as we now show.
Let the area of the region A be 1/a. Then the joint density of (U, V ) is
g(u, v) = a
0 ≤ u ≤ f 1/2 (v/u)
and 0 otherwise. Note that f (x) may be 0 for some values of x, and this may
in part determine the region A. Now consider the transformation X = V /U ,
Y = U . This is a one-to-one transformation with U = Y , V = XY . The
Jacobian J is
¯
¯
¯ ∂u ∂u ¯ ¯
¯
¯ ∂x ∂y ¯ ¯
¯
¯ ¯ 0 1 ¯¯
J =¯
= −y
¯=
¯ ∂v ∂v ¯ ¯ y x ¯
¯ ∂x ∂y ¯
Then the joint density of (X, Y ) is
0 ≤ y ≤ f 1/2 (x),
g(x, y) = g(u, v)|J| = ay
and the marginal density of X is
Z
Z
f 1/2 (x)
f ∗ (x) =
f 1/2 (x)
g(x, y)dy =
ay dy = a/2 f (x)
0
0
Since both f ∗ (x) and f (x) are densities, we must have f ∗ (x) = f (x) and
a = 2.
In fact, we only need specify f (x) up to a constant.
Suppose h(x) =
R∞
Kf (x) where h(x) is a non-negative function and −∞ h(x)dx = K < ∞.
Let A = {(u, v) : 0 ≤ u ≤ h1/2 (v/u)}, with area 1/a. Then g(u, v) = a for
0 ≤ u ≤ h1/2 (v/u) and g(x, y) = ay for 0 ≤ y ≤ h1/2 (x), and the marginal
density of X is
Z
∗
f (x) =
Z
h1/2 (x)
g(x, y)dy =
0
h1/2 (x)
ay dy = a/2 h(x).
0
But since f ∗ (x) must integrate to 1, we have that a = 2/K, i.e. the area of
the region A must be K/2.
To implement this idea, we generate points (u, v) uniformly over the
region A, which is easily done using rejection methods. We then take the
ratio v/u as a realization of a random variable with the required density
8
f (x). Note that there is no need to evaluate K; it is only necessary to
generate points uniformly spread over the region A.
Suppose that in particular we want to generate from a standard normal
2
distribution with kernel h(x) = e−x for −∞ < x < ∞. We need to generate
2
2
points uniformly overpthe region A = {(u, v) : 0 ≤ u ≤ e−v /(4u ) }, which has
boundaries v ≤ ±2u − ln(u), 0 ≤ u ≤ 1. This isp
easily done by
p rejection,
since A is contained in the rectangle 0 ≤ u ≤ 1, − 2/e ≤ v ≤ 2/e. Thus
we can use the following algorithm:
1. Generate u ∼ U (0, 1) and w ∼ U (0, 1), independent.
p
2. Let v = 2 2/e (w − 1/2).
3. If v 2 ≤ −4u2 ln(u) then return v/u as a standard normal deviate.
Otherwise go to 1.
0.00
−0.86
v
0.86
Note that in Step 2, (u, v) is uniformly distributed over the shaded rectangle shown in Figure 5. Points that satisfy the inequality in Step 3 (i.e.
points that are accepted) are uniformly distributed over the region A, which
is shaded with dots in Figure 5. If the inequality is not satisfied, the point
(u, v) is rejected, and we go back to Step 1.
0
1
u
Figure 5: Ratio of Uniforms Method
The evaluation of ln(u) for many random values u is computationally
expensive. The algorithm can be much improved by constructing quadratic
9
0.70
0.75
v
0.80
0.85
boundary curves around A. Figure 6 is a magnified view showing the boundary of region A as a solid line, an outside quadratic bound as a dotted line,
and an inside quadratic bound as a dashed line. A quick and cheap assessment of whether (u, v) falls outside A can be made by determining whether
(u, v) falls above the dotted line. Similarly a quick and cheap assessment
of whether (u, v) falls inside of A can be made by assessing whether (u, v)
falls below the dashed line. Only in a very small number of cases will it
be necessary to use the expensive assessment in Step 3 above. A very fast
algorithm developed by Leva [7] was used in this project.
0.35
0.40
0.45
0.50
u
Figure 6: Quadratic Bounds on Region A
It has been noted by Hörmann [4] that the use of the ratio of uniforms
method in conjunction with the linear congruential method leads to a gap
in the support of the distribution, in which no pseudo-random numbers will
√
be generated. The probability of this gap is of order 1/ m, which for our
linear congruential generator equals 2.156E-05. This gap does not seem to
affect the goodness-of-fit tests for the normal distribution (see Section 5.2).
Nevertheless, the combination of linear congruential and ratio of uniforms
methods should be avoided in the future.
We next transformed the standard normal variates into normal variables
with mean µ and standard deviation σ. This is easily done since if Z ∼
N (0, 1), then X = σZ + µ ∼ N (µ, σ 2 ). Thus we generate standard normal
variables, multiply them by σ and add µ.
10
4.3
Inverse Gaussian Distribution
P
The transformed normals Xi are then summed until the sum Sn = ni=0 Xi
exceeds the barrier at u. The arrival time to the barrier is estimated using
linear interpolation, as follows. Let N equal the minimum j such that
Sj ≤ u and Sj+1 > u. The line segment connecting the points (j, Sj ) and
(j + 1, Sj+1 ) has slope β = Sj+1 − Sj . Let (x, u) be the point of intersection
of the line segment and the barrier at u. Then
x=j+1−
4.4
(Sj+1 − u)
.
β
First-Passage-Two-Barrier Distribution
The algorithm for generating pseudo-random numbers from the FP2B distribution is identical to the algorithm for generating from the IG distribution,
except that Step 4 becomes:
• Repeat Steps 1-3 while the cumulative sum of transformed normals is
less than the upper barrier and greater than the lower barrier.
Arrival times to either the upper or lower barrier were obtained by linear
interpolation. We recorded both time of crossing and which barrier was
breached.
5
Testing of Random Numbers
Numbers generated from the uniform, normal and IG distributions were
tested for goodness of fit against the theoretical distributions using the chisquare test. For the FP2B distribution, the frequencies of the observed
arrival times to the upper and lower barriers were tested separately for goodness of fit against the theoretical subsurvivor functions given in Equations
7 and 8. We again used a chi-square test.
To perform the chi-square test, the support of the distribution is divided
into I bins, i = 1, 2, . . . , I. The test statistic is given by
X2 =
I
X
(Oi − Ei )2
Ei
i=1
where Oi is the observed frequency in the ith bin and Ei is the expected
frequency. If the null hypothesis is true, i.e. if the observations do come
11
from the distribution with expected frequencies Ei , then the statistic X 2 has
asymptotically a χ2 distribution with I − 1 degrees of freedom (df). (See for
instance [1].)
All statistical testing was done in SAS Version 8.2. The code used to do
some of the following tests can be found in Appendix C.
5.1
Uniform Distribution
We tested the implementation of the uniform random number generator for
correctness using the method described in [8]. This test involves starting the
recursion with a specified seed value and checking that the 100th random
uniform number generated is equal to a given constant.
To test for goodness of fit, we generated 9998 random uniform numbers
on the interval (0, 1). The interval (0, 1) was then divided into five bins of
equal length. The test produced a statistic of 3.9144 on 4 df and a p-value
of 0.4177, indicating no evidence of lack of fit.
5.2
Normal Distribution
We generated 1904 pseudo-random numbers from a standard normal distribution, and tested them for goodness-of-fit using the chi-square test with 12
bins. (While we aimed for around 2000 normal variates, some numbers were
of course rejected in the ratio of uniforms algorithm, resulting in fewer than
2000 normals. This is an idiosyncracy of the program that could easily be
altered.) The test statistic was 11.6518 on 11 df, giving a p-value of 0.3904.
Again, there is no evidence of lack of fit.
5.3
Inverse Gaussian with σ 2 =1
We generated samples of various sizes from the IG distribution with σ 2 =
1 and two different combinations of drift and barrier values, as shown in
Table 1. There was no limit on the time until the barrier was breached.
The calculation of expected values for the goodness of fit test is outlined in
Appendix C.
An interesting pattern appears in the results shown in Table 1. Looking
across rows of the table, we note that for fixed values of the parameters,
the p-value decreases as the sample size increases. For sample sizes of 400
or greater, the null hypothesis is rejected, indicating that there is evidence
that the generated numbers do not follow the IG distribution.
12
Parameters:
Sample size:
Chi-Square:
DF:
Pr>ChiSq:
Parameters:
Sample size:
Chi-Square:
DF:
Pr>ChiSq:
Drift=0.38
100
3.7109
5
0.5002
Drift=0.15
100
9.142
5
0.3699
Barrier=10.0
200
8.5423
5
0.3632
Barrier=5
200
11.680
5
0.0825
400
23.484
5
0.0343
600
66.3329
5
0.0006
Table 1: Goodness of Fit Test for IG distribution with σ 2 = 1
Parameters:
Sample size:
σ2 :
Chi-Square:
DF:
Pr>ChiSq:
Drift=0.15
100
0.0625
3.6997
5
0.4482
Barrier=5.0
100
0.25
6.2134
5
0.1838
100
2.25
8.2538
5
0.1428
100
25.0
24.4358
5
0.0004
Table 2: Goodness of Fit Test for IG distribution with varying σ 2
5.4
Inverse Gaussian with varying σ 2
We generated samples of size 100 from an IG distribution with fixed drift
and barrier, but varying volatility, σ 2 . The results, summarized in Table 2
show that as the volatility increases, the p-value decreases. For very large
values of the volatility parameter, there is strong evidence that the generated
variables do not follow an IG distribution.
5.5
First-Passage-Two-Barrier Distribution
We generated samples from the FP2B distribution with σ 2 = 1 using the
method described in Section 4. The computer program ran until a barrier
was crossed, with no limit on the time. For testing purposes, we isolated
the numbers that hit the upper barrier and used a chi-square statistic to
quantify goodness of fit with the upper subsurvivor function. More details
are given in Appendix C. The required values of the subsurvivor function
were calculated by summing terms in Equation 7 until the next term added
changed the sum by less than 0.0001.
As Table 3 shows, the same trend noticed with the IG numbers is present
13
Parameters:
Drift=0.15
Upper=5.0
Lower=-5.0
Parameters:
Drift=0.15
Upper=2.0
Lower=-2.0
Sample size:
Chi-Square:
DF:
Pr>ChiSq:
Sample size:
Chi-Square:
DF:
Pr>ChiSq:
83
8.9365
5
0.1116
64
25.2060
5
<0.0001
171
6.8382
5
0.2330
140
32.9380
5
<0.0001
338
11.5810
5
0.0410
278
93.2124
5
<0.0001
660
42.2480
5
<0.0001
560
216.9342
5
<0.0001
Table 3: Goodness of Fit Test for FP2B Upper Subsurvivor Function
here. As the sample size increases, the p-values become small, giving evidence that the generated numbers do not come from the specified distribution. Even though the sample sizes are not equal in the two rows of the table,
there is a suggestion that as the distance between the barriers decreases, the
p-value tends to decrease as well.
We also tested the arrival times to the lower barrier against the lower
subsurvivor function. However for all choices of parameters shown here, the
number of paths which hit the lower barrier was very small. Although the
null hypothesis was not rejected, the test has low power with small sample
sizes. As the test is unreliable, the results are not shown.
Other parameter values were tested (not shown), namely (µ=0.38, u=2.59,
l=-2.34, σ 2 =1) and (µ=0.38, u=3.21, l=-1.70, σ 2 =1), but in all cases the
p-values were < 0.0001 unless the sample size was less than 20. Sample sizes
lower than 20 are suspect, as the power of the test to detect a discrepancy
from the nominal distribution is low.
5.5.1
Probability of Crossing the Upper Barrier
As a further test of the random number generator, we compared the observed
proportion of paths which breached the upper barrier before the lower one to
the theoretical probability from Equation 9. The observed proportion was
calculated as the number of paths which reached the upper barrier divided by
the total number of paths in the simulation. All of the observed proportions
in Table 4 are from simulations of 10,000 paths with σ 2 = 1. The column
labelled Difference is calculated by subtracting the expected value from the
observed value.
For a sample of n paths, we can easily calculate the variance of the
observed proportion of paths that hit the upper barrier. In effect we have n
binomial trials, with constant probability of success p = P (upper) as given
14
Parameters
Upper=2.0
Lower=-2.0
Drift=0.1
0.2
0.3
0.4
0.5
Upper=3.0
Lower=-3.0
Drift=0.1
0.2
0.3
0.4
0.5
Upper=4.0
Lower=-4.0
Drift=0.1
0.2
0.3
0.4
0.5
Upper=5.0
Lower=-5.0
Drift=0.1
0.2
0.3
0.4
0.5
Observed
Expected
Difference
Standard Deviation
0.6356
0.7328
0.8284
0.8882
0.9308
0.5987
0.6899
0.7685
0.8320
0.8808
0.0369
0.0429
0.0599
0.0562
0.0500
0.0049
0.0046
0.0042
0.0037
0.0032
0.6711
0.8074
0.8941
0.9429
0.9752
0.6456
0.7685
0.8581
0.9168
0.9526
0.0255
0.0389
0.0360
0.0261
0.0226
0.0048
0.0042
0.0035
0.0028
0.0021
0.7087
0.8665
0.9347
0.9731
0.9901
0.6899
0.8320
0.9168
0.9608
0.9820
0.0188
0.0345
0.0179
0.0123
0.0081
0.0046
0.0037
0.0028
0.0019
0.0013
0.7289
0.9021
0.9588
0.9876
0.9933
0.7311
0.8808
0.9525
0.9820
0.9967
-0.0022
0.0213
0.0063
0.0056
-0.0034
0.0044
0.0032
0.0021
0.0013
0.0006
Table 4: Proportion of Upper Barrier Crossings
15
in Equation 9. Then the observed number of paths that hit the upper barrier
is a binomial random variable, X. The variance of the observed proportion
X/n is then pq/n. The standard deviation of X/n is shown in column 5 of
Table 4.
Several interesting patterns can be discerned in Table 4. First of all,
the observed proportion of paths that hit the upper barrier is almost always
farther than two standard deviations away from the expected proportion.
Secondly the observed proportion is almost always strictly greater than the
expected proportion. Overall, too many paths end at the upper barrier.
These facts indicate a systematic lack of fit.
For fixed barriers, increasing drift does not seem to have any systematic
effect on the difference between observed and expected proportions. However, as the distance of the barriers from the origin increases the difference
between the observed and expected values decreases.
6
Conclusions
It is well known that for natural populations, as the sample size increases,
the p-value from a goodness of fit test tends to decrease. As many applied
statisticians will attest, a goodness of fit test is almost guaranteed to reject
the null hypothesis for a large enough sample size. This is because a theoretical distribution is only a mathematical model for a natural population,
and is never an exact description. However one would hope that the same
would not be true for pseudo-random numbers generated by computer. The
algorithms that we used to generate uniform and normal random variables
are well-used and exhaustively tested, and goodness of fit tests showed no
evidence that the generated numbers did not come from the nominal distributions, even for sample sizes as large as 10,000. For the IG and FP2B
distributions, however, there was strong evidence of lack of fit, even for quite
modest sample sizes.
The method was tested independently in R and the results were confirmed, indicating that the fault does not lie with our Fortran program. In
this report we have shown results for interpolated crossing times, but using
the last time before the crossing, or the first time after the crossing gave
essentially the same results. Thus it seems that the method of generating
IG and FP2B random variables as the approximate time that a cumulative
sum of normals crosses a barrier is unsatisfactory, at least for values of the
parameters and time intervals investigated here.
Why does the method fail? As we noted in Table 4, for the FP2B dis-
16
u
x0
−l
0.0
0.5
1.0
1.5
2.0
2.5
time
Figure 7: Wiener process approximated by a step function
tribution, more paths than expected end at the upper barrier, suggesting
that we are somehow missing breaches of the lower barrier. A possible explanation is that there is a high probability that, between two observations
times, a path breaches the lower barrier and then immediately rises above
the barrier, as shown in Figure 7. Thus this breach of the lower barrier is unobserved. Later we observe the path crossing the upper barrier, and falsely
record the time of this event as the first crossing time. An obvious solution
is to decrease the time interval between steps so that the step function more
closely approximates a Wiener process.
A similar phenomenon could explain the lack of fit for the IG distribution. Figure 8 shows the theoretical density as a smooth curve, plotted over
a histogram of 10000 random numbers generated from the IG distribution
with µ = 0.15, u = 5 and σ = 1. Clearly the theoretical probability of hitting the barrier before time 15 is much higher than the observed proportion
of paths which do so. Conversely the theoretical probability of hitting the
barrier at a very late time is less than that observed. (The histogram has
a much longer tail than is shown in the figure; three out of 10,000 observations were greater than 500.) This seems to indicate that the first passage
time of some of the paths is being missed, and we instead are observing the
second (or later) hitting time. Recall that in all our simulations, the process
takes one step per unit time interval. Again it appears that shorter intervals
between steps should ameliorate the systematic lack of fit.
17
0.04
0.03
0.02
0.00
0.01
Density
0
50
100
150
200
t
Figure 8: Histogram of generated IG random variates versus theoretical
density; µ = 0.15, u = 5.0, σ = 1
There is some evidence that the fit improves as the distance from the
origin to the barrier increases. Of course, the process must take more steps
to reach a relatively distant barrier. Thus there is evidence that the method
improves as the average number of steps needed to reach the barrier increases. Decreasing the time interval between steps would also result in an
increased number of steps needed to reach the barrier. Thus we have further support for the notion that decreasing the times between steps would
improve the performance of this method of random number generation.
Faster and more reliable methods for generating random variables from
the IG distribution exist (see for instance [9]). While there are other methods
of generating realizations of a Wiener process (see for instance [10]), we know
of no other way to generate variates from the FP2B distribution, except as
the time that a simulated Wiener process breaches one of two barriers.
Further work will focus on testing different time intervals between steps
and developing rules of thumb for the time interval necessary to achieve
satisfactory fit with the nominal distributions. More importantly, better
and more efficient methods for generating random numbers from the FP2B
distribution are needed.
18
Acknowledgements
This project was funded by the University of Guelph and the Natural Sciences and Engineering Research Council. We would like to thank the Department of Mathematics and Statistics at the University of Guelph for
providing space, administrative and computer support, and for making opportunities like this available to students.
19
References
[1] Bickel, P.J. and Doksum, K.A. Mathematical Statistics. Holden-Day,
Oakland, California, 1977.
[2] Chhikara, R.S. and Folks, J.L. The Inverse Gaussian Distribution–
Theory, Methodology and Application. New York. 1989.
[3] Feller, W. An Introduction to Probability Theory and its Applications,
Volume 1. Wiley, New York. 1986.
[4] Hörmann, W. A Note on the Quality of Random Variates Generated
by the Ratio of Uniforms Method. ACM Transactions on Modeling and
Computer Simulation. Vol.4, No. 1, 1994. pp. 96-106.
[5] Horrocks, J. and Thompson, M.E. Modelling Event Times with Multiple
Outcomes Using a Wiener Process with Drift. Accepted Lifetime Data
Analysis.
[6] Kinderman, A.J. and Monahan, J.F. Computer Generation of Random
Variables Using the Ratio of Uniform Deviates. ACM Transactions on
Mathematical Software, Vol 3 No. 3 September 1977, pp. 257-260.
[7] Leva, J.L. A Fast Normal Random Number Generator. ACM Transactions on Mathematical Software, Vol. 18, No. 4, December 1992. pp.
449-453.
[8] Park, S.K. and Miller, K.W. Random Number Generators: Good Ones
are Hard to Find. Communications of the ACM. Vol. 31, No. 10, 1988.
pp. 1192-1201.
[9] Seshadri, V. The Inverse Gaussian Distribution. Oxford, Clarendon
Press, 1993.
[10] Taylor, H.M., Karlin, S., An Introduction to Stochastic Modeling, 3rd
Edition. California, Academic Press, 1998.
20
A
Parametrization of the IG Density
(u−x0 )2
σ2
In this appendix, we show that if λ =
and ν =
u−x0
µ
u − x0 −(u−x02−µt)2
2σ t
f (t; u, µ, σ, x0 ) = √
e
2πσ 2 t3
is equivalent to
r
f (t; ν, λ) =
Start with 11 and let δ =
1
ν
r
f (t; δ, ρ) =
1 (t− 1 )2
δ
1
−ρ
2t/δ 2
e
2πρt3
1
f (t; δ, ρ) = p
2πρt3
µ
u−x0
and ρ =
(11)
then
which simplifies to
then, substituting δ =
(10)
−λ(t−ν)2
λ
2ν 2 t
e
.
2πt3
1
λ
and ρ =
then
e
σ2
(u−x0 )2
−
(12)
(1−δt)2
2ρt
(13)
gives
2
−
1
f (t; u, µ, σ, x0 ) = q
2
σ
3
2π (u−x
2t
0)
e
µt
(1− u−x
)
0
σ2
t
2
(u−x0 )2
(14)
which simplifies to
u − x0 −(u−x02−µt)2
2σ t
e
f (t; u, µ, σ, x0 ) = √
2πσ 2 t3
as required.
21
(15)
B
Fortran Code
The simulation code in its entirety is reproduced here for the double barrier
model. The code for the single barrier model involves very little modification, just a removal of the lines that stop the run when the lower barrier is
breached.
PROGRAM MAIN IMPLICIT NONE
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! This program simulates a Wiener process.
! Author: Maia Lesosky, 2003,Univeristy of Guelph
!
! This program will generate maxN variates from the FP2B
! distribution.
! This program requires 1 parameter file to run, called para.txt.
! The parameter file must have the following format:
!
&nlist
!
drift=0.15
!
topbarr=5.0
!
botbarr=-5.0
!
step=5
!
maxN=100
!
/
! where the parameters are defined as follows:
!
!DRIFT :: drift paramter for the Wiener process
!TOPBARR :: value of the top barrier u>0
!BOTBARR :: value of the bottom barrier l<0
!STEP :: value of variance of Wiener process
!MAXN :: total number of barrier crossings wanted
!
!This program can write to any number of files. Currently it is set
!up to write to a single file called arr.txt that has the run number,
!the interpolated arrival time and an indicator variable which has the value
!1 if the upper barrier was crossed and 0 if the lower barrier was crossed.
!
!A list of all the variables and definitions follows:
!
! unfm1,unfm2 :: The two uniform random numbers used to generate the normal
22
!
random number
! S() :: Matrix that holds the generated uniform random numbers
! nrml :: Unadjusted (standard normal) random normal variate
! nrml_adj :: Adjusted (N(drift,step)) random variate
! arr_tme :: Interpolated arrival time
! sum_nrml :: running sum of adjusted normal variates
! p,t,a,b,r1,r2 :: parameters for the ratio of uniforms method
! i,j,k,l,m,n ::counters
! maxI
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
NAMELIST/nlist/ drift,topbarr,botbarr,step,maxN
REAL :: drift,topbarr,botbarr,step
REAL :: unfm1,unfm2
REAL
::p = 0.449871, t = -0.386595, a = 0.19600, b = 0.25472,
r1 = 0.27597, r2 = 0.27846
REAL*8 :: S(10000)
REAL*8 :: nrml,nrml_adj,arr_tme,iseed,sum_nrml
REAL*8 :: u,v,x,y,q,rm
INTEGER :: maxI=1000,maxR=10000,maxS=10000,maxN
INTEGER :: rindex=1,scount=1,flag=1,numN=0,numAt,numAb
INTEGER :: i,j,k,l,m,n,numtop
INTEGER :: Date_time(8)
INTEGER :: dte1,dte2,dte3,tme1,tme2,tme3
INTEGER*8 :: mm,aa,seed
CHARACTER :: filename*15,ext*4,file5*15,file6*15
CHARACTER :: file1*15,file2*15,file3*15,file4*15
CHARACTER :: file7*15
CHARACTER(len=12) real_clock(3)
aa=16807 mm=2147483647
file1=’rndunf’//’.txt’
file2=’rndnrl’//’.txt’
file4=’para2’//’.txt’
file5=’arr’//’.txt’
23
&
file6=’bot’//’.txt’
file7=’path’//’.txt’
OPEN(unit=20,file=file4,status=’old’)
READ(20,NML=nlist)
CLOSE(20)
step=sqrt(step)
OPEN(unit=21,file=file5,status=’old’)
WRITE(21,*)’upper=’,topbarr,’lower=’,botbarr
WRITE(21,*)’drift=’,drift,’std. dev.=’,step
numAt=0
numAb=0
DO i=1,maxN
numN=0
flag=1
sum_nrml=0
DO j=1,maxR
S(j)=0
END DO
!
!
!
!
!
Next section produces an array of seeds used by
random number generator by making calls to the
system time/date clock
tme1=hour,tme2=min,tme3=sec
dte1=month,dte2=day,dte3=year
CALL DATE_AND_TIME (real_clock(1),real_clock(2), &
real_clock(3), date_time)
CALL IDATE (dte1,dte2,dte3)
tme1=date_time(5)
tme2=date_time(6)
tme3=date_time(7)
24
scount = 1
seed = tme1*tme2*tme3+dte1+dte2+dte3
DO j=1, maxS
seed = aa*seed
seed = mod(seed,mm)
rm=real(mm)
S(j)=(seed/rm)
END DO
!This section checks and produces random normals
DO m=1,maxI
unfm1=S(rindex)
!obtaining 2 random numbers to check
rindex=rindex+1
unfm2=S(rindex)
rindex=rindex+1
IF(unfm1==0.OR.unfm2==0.OR.rindex>=maxS)THEN
!Reset uniform number array
CALL DATE_AND_TIME (real_clock(1),real_clock(2), &
real_clock(3), date_time)
CALL IDATE (dte1,dte2,dte3)
tme1=date_time(5)
tme2=date_time(6)
tme3=date_time(7)
scount = 1
seed = tme1*tme2*tme3+dte1+dte2+dte3
!The seed is composed of the hour multiplied by the minute and
!second then added to the sum of the day, month and year.
DO j=1, maxS
seed = aa*seed
seed = mod(seed,mm)
rm=real(mm)
25
S(j)=(seed/rm)
END DO
rindex=1
CYCLE
END IF
unfm2 = 1.7156 * (unfm2 -0.5 )
!
Evaluate the quadratic form
x = unfm1 - p
y = ABS(unfm2) - t
q = x**2 + y*(a*y - b*x)
!
Accept P if inside inner ellipse
IF (q < r1) THEN
flag=1
!
Reject P if outside outer ellipse
ELSE IF(q > r2) THEN
flag=0
!
Reject P if outside acceptance region
ELSE IF (unfm2**2 < -4.0*LOG(unfm1)*unfm1**2) THEN
flag=1
END IF
IF(flag==0)THEN
!If the uniforms have been rejected return to top
!of loop and get two new random uniforms to try
CYCLE
ELSE
!the uniforms have been accepted
26
numN=numN+1
nrml = unfm2/unfm1
!random normal with mean 0 and variance 1
!adjust the normal 0, 1 to be normal drift,step
nrml_adj=step*nrml+drift
sum_nrml=sum_nrml+nrml_adj
! sum of independent random normals
!this section stops the iterations if the barrier has been reached
IF(sum_nrml>=topbarr.OR.sum_nrml<=botbarr)THEN
numAt=numAt+1
IF(sum_nrml>=topbarr)THEN
arr_tme=(topbarr-sum_nrml+nrml_adj*numN)/nrml_adj
WRITE(21,*)numAt,arr_tme,1
EXIT ELSE
arr_tme=(botbarr-sum_nrml+nrml_adj*numN)/nrml_adj
WRITE(21,*)numAt,arr_tme,0
EXIT
END IF
ELSE
CONTINUE
END IF
END IF
END DO
END DO
CLOSE(21)
END PROGRAM MAIN
27
C
SAS Testing Code
The code reproduced here is a sample of the SAS code used to test the
goodness of fit of the generated random numbers.
1. Code for testing the uniform random numbers
data unfmbin;
set unfmtest;
if 0 le rnd le .2
if .2 le rnd le .4
if .4 le rnd le .6
if .6 le rnd le .8
if .8 le rnd le 1.0
run;
then
then
then
then
then
xd=1;
xd=2;
xd=3;
xd=4;
xd=5;
proc freq data=unfmbin;
tables xd/chisq;
run;
By default, SAS assumes that the probability of falling in each bin is
a constant, p, and calculates the expected values as np where n is the
total sample size.
2. Code for testing Inverse Gaussian Distribution for parameters drift=0.15,
upper barrier=5.0.
data topbin;
set topbarr;
if 0 le rnd le 20 then xd=1;
if 20 le rnd le 30 then xd=2;
if 30 le rnd le 40 then xd=3;
if 40 le rnd le 50 then xd=4;
if 50 le rnd le 60 then xd=5;
if rnd ge 60 then xd=6;
run;
proc freq data=topbin;
tables xd/
testp=( 0.4924, 0.1569, 0.0973, 0.0644, 0.0447, 0.1444);
run;
28
Since the probability of falling in each bin is not constant, the user
must supply a list of probabilities in the testp statement. These probabilities are calculated as follows. Suppose that X has the IG distribution, with cdf F (x). First we divide the positive real line into
bins: (x0 , x1 ), (x1 , x2 ), . . . , (xn−1 , xn ), where x0 = 0 and xn = ∞.
Let pi equal the probability of falling in the ith bin, in other words,
pi = P (X ∈ (xi−1 , xi )) for i = 1, 2, . . . , n. Then pi is found as
F (xi ) − F (xi−1 ) for i = 1, 2, . . . , n.
3. For the FP2B distribution, we tested the observed frequencies of observations that hit the upper barrier in various time intervals against
the conditional upper subsurvivor function Fu |D = 1 where D is an
indicator variable that takes the value 1 if the path crosses the upper barrier before the lower one. The required probabilities were
calculated as follows. Suppose that X has the FP2B distribution,
with upper subsurvivor function Fu (x). First we divide the positive
real line into bins: (x0 , x1 ), (x1 , x2 ), . . . , (xn−1 , xn ), where x0 = 0 and
xn = ∞. Let pi = P (X ∈ (xi−1 , xi )|D = 1) for i = 1, 2, . . . , n.
Then pi = (Fu (xi−1 ) − Fu (xi ))/P (upper). Note that Fu (∞) = 0 and
Fu (0) = P (upper), as given in equation 9.
Sample Calculation
Calculation of the probabilities associated with the upper barrier, for
the parameters µ = 0.15, u = 10.0, and l = 10.0, with bins from 0-10,
10-20, 20-30, 30-40, 40-50, 50-60, 60+.
(a) Calculate P (upper),
)
½
¾ (
1 − e2µ`
1 − e2(0.15)(10.0)
= 0.9526
=
e−2µu − e2µ`
e−2(0.15)(10.0) − e2(0.15)(10.0)
(b) Using Equation 7 the following values are calculated : F(10)=
0.9462, F(20)=0.8572, F(30)=0.7134, F(40)=0.5744, F(50)=0.4560,
F(60)=0.3619.
(c) The differences di = Fu (xi−1 ) − Fu (xi ) are calculated:
p1 = P (upper)-F(10)=0.00636, p2 = F(10)-F(20)=0.0891, p3 =
F(20)-F(30)=0.1437, p4 = F(30)-F(40)=0.1390, p5 = F(40)F(50)=0.1184, p6 = F(50)-F(60)=0.0942, p7 = F(60)=0.3619
(d) Finally pi = di /P (upper). The testp statement corresponding
to this example is testp=(0.0067, 0.0935, 0.1509, 0.1459, 0.1243,
0.0988, 0.3798);
29