Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Florida State University Libraries Electronic Theses, Treatises and Dissertations The Graduate School 2004 Scrambled Quasirandom Sequences and Their Applications Hongmei Chi Follow this and additional works at the FSU Digital Library. For more information, please contact [email protected] THE FLORIDA STATE UNIVERSITY COLLEGE OF ARTS AND SCIENCE SCRAMBLED QUASIRANDOM SEQUENCES AND THEIR APPLICATIONS By HONGMEI CHI A Dissertation submitted to the Department of Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy Degree Awarded: Summer Semester, 2004 The members of the Committee approve the dissertation of Hongmei Chi defended on June 4, 2004. Michael Mascagni Professor Directing Dissertation Sam Huckaba Outside Committee Member Mike Burmester Committee Member Robert van Engelen Committee Member Ashok Srinivasan Committee Member The Office of Graduate Studies has verified and approved the above named committee members. ii To Changyun and Judy . . . iii ACKNOWLEDGEMENTS First and foremost, I would like to express my sincere gratitude to my major advisor Dr. Michael Mascagni for his valuable support, infinite patience and research guidance throughout the course of my graduate study. I would also like to express my deep appreciation to the other committee members, Dr. Mike Burmester, Dr. Ashok Srinivasan, Dr. Robert van Engelen and Dr. Sam Huckaba, for their valuable time, helpful discussions and suggestions. Without their help, it is impossible for me to complete my dissertation. Special thanks are due to the past and current members of Dr. Mascagni’s research group: Dr. Aneta Karaivanova, Dr. Nikolai Simonov, and Mr. Chuck Fleming. I have enjoyed working in CSIT– a very supportive environment. Special thanks for Ms. Mimi Burbanks for her LaTex technique support during my writing dissertation period. I would also like to thank my beloved parents, who have encouraged me to make my dreams come true and have provided me with love and spiritual support. I am deeply grateful for my husband and daughter, who have given me unlimited support during my graduate study. Finally, many thanks are due to everyone who helped me during my time at Florida State University. iv TABLE OF CONTENTS List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x 1. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Randomized Quasi-Monte Carlo Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Quasirandom Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 The Koksma-Hlawka Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Scrambled Quasirandom Seqences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.5 Derandomization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.7 Paper Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.8 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2. MEASURES OF IRREGULARITY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Theoretical Bounds on Discrepancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Other Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Orthogonal Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Practical Integral Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 12 15 15 15 17 3. THE SCRAMBLED HALTON SEQUENCE . . . . . . . . . . . . . . . . . . . . . . . . 3.1 The Halton Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Correlations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Methods to Break Correlations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 A Scrambled Halton Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Implementation Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Linear Scrambling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7 Optimal Halton Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 18 21 23 25 26 27 28 30 4. THE SCRAMBLED AND OPTIMAL FAURE SEQUENCE . . . . . . . . . . 4.1 The Scrambled Faure Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Generalized Faure (GFaure) Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.2 I-binomial Scrambling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 The Optimal Faure Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Geometric Asian Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 32 33 34 35 37 v 4.4 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5. THE SCRAMBLED SOBOĹ SEQUENCE . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 The Soboĺ Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 Initial Direction Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Scrambling Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 An Algorithm for Scrambling the Soboĺ sequence . . . . . . . . . . . . . . . . . . . . . . 5.4 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 41 42 43 45 46 48 6. RANDOMIZATION OF LATTICE POINTS . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 The Methods of Good Lattice Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Criteria for Good Generating Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Randomization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Infinite Lattice Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 49 50 51 53 54 54 7. APPLICATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Automatic Error Estimates for QMC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Parallel Quasirandom Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1 A Parallel and Distributed Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.2 Testing Parallel Quasirandom Sequences . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Derandomization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 55 56 58 59 59 60 8. CONCLUSIONS AND FUTURE WORK . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 A. PARALLEL PSEUDORANDOM NUMBER GENERATORS . . . . . . . . . 63 B. LCGS WITH SOPHIE-GERMAIN MODULI . . . . . . . . . . . . . . . . . . . . . . . 65 C. LINEAR SCRAMBLING AND DERANDOMIZATION . . . . . . . . . . . . . . 68 D. ADDITIONAL NUMERICAL EXPERIMENTS . . . . . . . . . . . . . . . . . . . . . 75 REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 BIOGRAPHICAL SKETCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 vi LIST OF TABLES 3.1 Optimal values for Wp for the first 40 dimensions of the Halton Sequence . . . . 29 1 1 3.2 Estimates of the Integral 0 . . . 0 Πsi=1 |4xi − 2|dx1 . . . dxs = 1 by using Halton sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.1 Parameters Used for Numerical Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.2 Pricing Geometric Asian Options Using Parameters in Table 3.1 . . . . . . . . . . . 38 B.1 The Sophie-Germain (S-G) Primes Closest to but Less Than 2q . . . . . . . . . . . . 66 D.1 Estimates of I1 (f ) in (D.1) by using Halton sequences . . . . . . . . . . . . . . . . . . . 79 D.2 Estimates of I2 (f ) in (D.2) with parameters ai = 0 by using Halton sequences vii 80 LIST OF FIGURES 1.1 Left figure: 2000 pseudorandom numbers; right figure: 2000 Soboĺ (quasirandom) numbers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 L2 -discrepancy for 8 dimensional Soboĺ sequence. 3 . . . . . . . . . . . . . . . . . . . . . . 14 2.2 Left figure: 1024 points of the Halton sequence; right figure: 1024 points of the Faure sequence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.1 Poor 2-D projections were studied in several papers. For example, left top: was included in Braaten’s paper [1], right top: in Morokoff’s paper [2], left bottom: in Kocis’s paper [3], right bottom: random-Start sequence [4]. . . . . . . . . . . . . . 20 4.1 Left: The original Faure sequence, right: an optimal Faure sequence . . . . . . . . 33 4.2 Left figure: geometric mean of 3 stock prices; right figure: geometric mean of 50 stock prices. Here the label “Faure” refers to the original Faure sequence [5], while “dFaure” refers to my optimal Faure sequence. . . . . . . . . . . . . . . . . . . 39 5.1 Left: 4096 points of the original Soboĺ sequence and the initial direction numbers are from Bratley and Fox’s paper [6]; right: 4096 points of the scrambled version of the Soboĺ sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.2 Left: 4096 points of the original Soboĺ sequence with all initial direction numbers ones [7], right: 4096 points of the scrambled version of the Soboĺ sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.3 Left figure: geometric mean of 10 stock prices; right figure: geometric mean of 30 stock prices. Here the label “Sobol” refers to the original Soboĺ sequence [6], while “DSobol” refers to my optimal Soboĺ sequence. . . . . . . . . . . . . . . . . . 47 6.1 An example of a lattice point set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 D.1 Estimates of the integral I1 (f ) in (D.1) by using various Halton sequences. . . . 81 D.2 Estimates of the integral I2 (f ) in (D.2) with parameters ai = 0 by using various Halton sequences. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 D.3 Estimates of the integral I2 (f ) in (D.2) with parameters ai = i by using various Halton sequences. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 D.4 Estimates of the integral I2 (f ) in (D.2) with parameters ai = i2 by using various Halton sequences. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 D.5 Estimates of the integral I1 (f ) in (D.1) by using various Faure sequences. . . . . 85 viii D.6 Estimates of the integral I2 (f ) in (D.2) with parameters ai = 0 by using various Faure sequences. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 D.7 Estimates of the integral I2 (f ) in (D.2) with parameters ai = i by using various Faure sequences. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 D.8 Estimates of the integral I2 (f ) in (D.2) with parameters ai = i2 by using various Faure sequences. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 D.9 Estimates of the integral I1 (f ) in (D.1) by using various Soboĺ sequences. . . . . 89 D.10 Estimates of the integral I2 (f ) in (D.2) with parameters ai = 0 by using various Soboĺ sequences. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 D.11 Estimates of the integral I2 (f ) in (D.2) with parameters ai = i by using various Soboĺ sequences. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 D.12 Estimates of the integral I2 (f ) in (D.2) with parameters ai = i2 by using various Soboĺ sequences. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 ix ABSTRACT Quasi-Monte Carlo methods are a variant of ordinary Monte Carlo methods that employ highly uniform quasirandom numbers in place of Monte Carlo’s pseudorandom numbers. Monte Carlo methods offer statistical error estimates; however, while quasi-Monte Carlo has a faster convergence rate than normal Monte Carlo, one cannot obtain error estimates from quasi-Monte Carlo sample values by any practical way. A recently proposed method, called randomized quasi-Monte Carlo methods, takes advantage of Monte Carlo and quasi-Monte Carlo methods. Randomness can be brought to bear on quasirandom sequences through scrambling and other related randomization techniques in randomized quasi-Monte Carlo methods, which provide an elegant approach to obtain error estimates for quasi-Monte Carlo based on treating each scrambled sequence as a different and independent random sample. The core of randomized quasi-Monte Carlo is to find an effective and fast algorithm to scramble (randomize) quasirandom sequences. This dissertation surveys research on algorithms and implementations of scrambled quasirandom sequences and proposes some new algorithms to improve the quality of scrambled quasirandom sequences. Besides obtaining error estimates for quasi-Monte Carlo, scrambling techniques provide a natural way to parallelize quasirandom sequences. This scheme is especially suitable for distributed or grid computing. By scrambling a quasirandom sequence we can produce a family of related quasirandom sequences. Finding one or a subset of optimal quasirandom sequences within this family is an interesting problem, as such optimal quasirandom sequences can be quite useful for quasi-Monte Carlo. The process of finding such optimal quasirandom sequences is called the derandomization of a randomized (scrambled) family. We summarize aspects of this technique and propose some new algorithms for finding optimal sequences from the Halton, Faure and Soboĺ sequences. Finally we explore applications of derandomization. x CHAPTER 1 INTRODUCTION 1.1 Randomized Quasi-Monte Carlo Methods Monte Carlo (MC) methods are based on the simulation of stochastic processes whose expected values are equal to computationally interesting quantities. MC methods offer simplicity of construction, and are often designed to mirror some process whose behavior is only understood in a statistical sense. However, there are a wide class of problems where MC methods are the only known computational method of solution. Despite the universality of MC methods, a serious drawback is their slow convergence, which is based on the O(N −1/2 ) behavior of the size of statistical sampling errors. One generic approach to improving the convergence of MC methods has been the use of highly uniform random numbers in place of the usual pseudorandom numbers. While pseudorandom numbers are constructed to mimic the behavior of truly random numbers, these highly uniform numbers, called quasirandom numbers (or low-discrepancy sequences), are constructed to be as evenly distributed as is mathematically possible. The use of quasirandom numbers in MC leads to quasi-Monte Carlo (QMC) methods [8, 9]. Indeed, pseudorandom numbers are scrutinized via batteries of statistical tests that check for statistical independence in a variety of different ways. In addition, these tests check for uniformity of distribution, but not with excessively stringent requirements. Thus, one can think of computational random numbers as either those that possess considerable independence, pseudorandom numbers, or those that possess considerable uniformity, quasirandom numbers. Quasirandom numbers are constructed to minimize a measure of their deviation from uniformity called discrepancy (the definition will be given in the next section). While quasirandom numbers do improve the convergence of applications like numerical integration, 1 it is by no means trivial to enhance the convergence of all MC methods. In order to improve the situation for MC and especially QMC methods, the analysis and use of randomized quasirandom sequences has been undertaken. The core idea behind randomizing QMC (RQMC) [10] is to apply an effective and fast randomization (scrambling) algorithm to existing quasirandom sequences. The purpose of scrambling in QMC is threefold. Primarily, it provides a practical method to obtain error estimates for QMC based on treating each scrambled sequence as a different and independent random sample from a family of randomly scrambled quasirandom numbers [11]. Thus, RQMC overcomes the main disadvantage of QMC while maintaining the favorable convergence rate of QMC. Secondarily, scrambling gives us a simple and unified way to generate quasirandom numbers for parallel, distributed, and grid-based computational environments. Finally, RQMC provides many more choices of quality quasirandom sequences for QMC applications, and perhaps even optimal choices as a result of derandomization. Thus, a careful exploration of scrambling and derandomization methods coupled with library-level implementations will play a central role in the continued development and use of RQMC techniques. This dissertation explores these issues and summarizes research on algorithms and implementations of scrambled quasirandom sequences and proposes new algorithms to improve the quality of scrambled quasirandom sequences. 1.2 Quasirandom Sequences A quasirandom sequence, sometimes called a low-discrepancy sequence, is normally generated in the unit s-dimensional hypercube, I s = [0, 1)s , and attempts to fill the hypercube as uniformly as possible. The original construction of quasirandom sequences was related to the van der Corput sequence, which is a one-dimension quasirandom sequence based on digital inversion. This digital inversion method is a central idea behind the construction of many current quasirandom sequences in arbitrary bases and dimensions. Following that, Halton [12] generalized the van der Corput sequence to s dimensions, and Soboĺ [13, 14] introduced the sequences that now bear his name. A significant generalization of these methods was proposed by Faure [15] to the sequences that bear his name. Later, Niederreiter [16] generalized the existing construction of the Soboĺ and Faure sequences to 2 2000 2-D pseudorandom points(mlfg) 2000 2-D quasirandom points (sobol) 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0.2 0.4 0.6 0.8 0 1 0.2 0.4 0.6 0.8 1 Figure 1.1. Left figure: 2000 pseudorandom numbers; right figure: 2000 Soboĺ (quasirandom) numbers. arbitrary bases. These are now called Niederreiter sequences. Tezuka [17] further generalized Niederreiter sequences by using the polynomial arithmetic analogue of Halton sequences. From Fig. (1.1), I can see that pseudorandom numbers tend to cluster while quasirandom numbers are uniformly distributed. Before I give the definition of a low-discrepancy sequence, I must define a common measure of uniformity, called discrepancy. Discrepancy is a measure of the lack of uniformity or equidistribution of points placed in a set, usually in the unit hypercube, [0, 1)s . The most widely studied discrepancy measures are based on the Lp norms (p = 2, or p = ∞). With p = 2 this discrepancy is called the L2 -discrepancy. When p = ∞, that discrepancy is called ∗ the star-discrepancy, DN , and its definition is the following [2]: (1) (2) (s) Definition 1 For any sequence {xn } ∈ [0, 1)s with N elements, define xi = (xi , xi , . . . , xi ), ∗ and J(ν) = [0, ν1 ) × [0, ν2 ) × · · · × [0, νs ), then the star-discrepancy of this sequence, DN , is given by: where 0 ≤ νj ≤ 1. s 1 ∗ DN = sup #{xi ∈ J(ν)} − νj , 0≤νj <1 N j=1 3 (1.1) While seemingly complicated, for a one dimensional point set the star-discrepancy is the Kolmogorov-Smirnov statistic based on the uniform distribution. The construction of quasirandom sequences is based on minimizing their discrepancy. Quasirandom sequences aim to have the fraction of their points within any subset J(ν) = [0, ν1 ) × · · · × [0, νs ) as close as possible to the subset’s volume fraction. Based on star-discrepancy, the definition of a low-discrepancy sequence in [0, 1)s is expressed as: Definition 2 For any N > 1, and sequence {xi }, let {xi }1≤i≤N denote the first N points of the sequence {xi }. If I have ∗ ≤ Cs DN (log N)s , N (1.2) where the constant c(s) depends only on the dimension, s, then the sequence {xi }, is called a low-discrepancy sequence. 1.3 The Koksma-Hlawka Inequality The discrepancy of a quasirandom sequence enters into QMC via the famous KoksmaHlawka inequality. Assume that an integrand, f , is defined over the s-dimensional unit cube, [0, 1)s , and that I(f ) is defined as: I(f ) = f (x)dx = Is 0 1 ... 1 f (x(1) , . . . , x(s) )dx(1) . . . dx(s) . (1.3) 0 Then the s-dimensional integral, I(f ), in Equation (1.3) may be approximated by QN (f ) [8]: QN (f ) = N ωi f (xi ), (1.4) i=1 where xi is in [0, 1)s , and the ωi′ s are weights. If {x1 , . . . , xN } is chosen randomly, and ωi = 1 , N then QN (f ) becomes the standard Monte Carlo integral estimate, whose statistical error can be estimated using the Central Limit Theorem. If {x1 , . . . , xN } are a set of quasirandom numbers, then QN (f ) is a quasi-Monte Carlo estimate. In fact, the Koksma-Hlawka inequality is essentially the only theoretical tool for estimating the accuracy of such a QMC 4 estimate, and was motivated by numerical integration. If a function, f , in [0, 1)s is of bounded variation, then I have the Koksma-Hlawka inequality: Theorem 1 For any sequence {xn }1≤n≤N and any function f with variation in the sense of Hardy-Krause, V (f ), bounded, I have N 1 ∗ f (xi ) − I(f ) ≤ V (f )DN , N i=1 (1.5) ∗ is the star-discrepancy of point set {x1 , . . . , xN }. where DN While this is a very fundamental result in QMC, Caflisch [18] gives a surprisingly simple and elegant proof of this inequality that elucidates how basic it really is in QMC. The following proof is based on Caflisch’s paper. Define the variation of f , a function of a single variable, as V (f ) = 1 0 df dx. dx In s dimensions, the variation in the sense of Hardy-Krause is defined as s s (1) f ∂ (i) dx ...dx(s) + V (f ) = V (f1 ) (1) (s) ∂x ...∂x [0,1)s i=1 (i) where f1 is the restriction of f to the boundary xi = 1. If I introduce the notation R(J(ν)) as Calfisch [18]. for a sequence of N points {xn } in the unit cube I s = [0, 1)s , I can define RN (J(ν)) = s i=1 νi − 1 #{xn ∈ J(ν)}, N and ∗ DN = sup |RN (J(ν))| . ν∈I s Note that in the unit cube, I s = [0, 1)s , N 1 R(x) = 1 − δ(x − xi ) dx, N i=1 5 (1.6) where R(x) = RN (J(x)) as defined in equation (1.6), and consider a function, f , that vanishes on the boundary of the unit cube, I s . N N 1 1 f (xi ) = f (x)dx − f (xi ) I(f ) − Is N i=1 N i=1 N δ(x − xi )]f (x)dx = [1 − Is i=1 = R(x)df (x) Is ≤ (sup R(x)) |df (x)| = x ∗ DN V Is (f ). (1.7) For f that is nonzero on the boundary of the unit cube, the terms from this intgration by parts are bounded by the boundary terms in V (f ). I should note that in the Koksma-Hlawka inequality both V (f ) and the star-discrepancy, ∗ DN , are difficult to calculate in practice, and so the Koksma-Hlawka inequality is actually not very useful in practical QMC error estimation [9]. Thus I have to find a practical method to calculate the QMC integration error. Fortunately, scrambled quasirandom sequences provide a very good procedure for estimating QMC integration error, as I shall see. 1.4 Scrambled Quasirandom Seqences Randomness can be brought to bear on quasirandom sequence through various scrambling techniques. By using random numbers to scramble the order of the quasirandom numbers or their digits, one randomizes quasirandom sequences. Thus by the term “scrambling” I are referring more generally to the randomization of quasirandom numbers. By scrambling a quasirandom sequence, one can produce a family of related quasirandom sequences. This family can be used in generating parallel quasirandom sequencs. Also, a group of optimal quasirandom sequences within this family can be quite useful for enhancing the performance of ordinary QMC. Finding fast and effective scrambling algorithms is the main task in this dissertation. After I studied various scrambling methods, I found that these fall into two basic categories. One is based on randomized shifting [19], which has the form 6 zn = xn + r (mod 1), (1.8) where xn is a quasirandom number in [0, 1)s , and r is a single s-dimensional pseudorandom number. With different pseudorandom numbers, r, I have a different scrambled version {zn } of the original quasirandom numbers {xn }. (1) (2) (s) The other method [11] is based on digit permutations. Let xn = (xn , xn , . . . , xn ) be a (1) (2) (s) quasirandom number in [0, 1)s , and zn = (zn , zn , . . . , xn ) be the scrambled version of the (j) (j) (j) (j) (j) point xn . Suppose each xn can be represented in base b as xn = 0.xn1 xn2 ...xnK ... with K being the number of digits to be scrambled. Then I define zn(j) = σ(x(j) n ), for j = 1, 2, .., s, (j) (1.9) (j) where zni = πi (xni ), for i = 1, 2, ..., K, and σ = {π1 , π2 , . . . , πK }, and each πi is a permutation of the digits, {0, ..., b − 1}. There are various versions of scrambling methods based on digital permutation, and the differences among those methods are based on the definitions of the πi ’s. These include Owen’s nested scrambling [11, 20], Tezuka’s GFaure [7], and Matousek’s linear scrambling [21]. Whenever scrambled methods are applied, pseudorandom numbers are the “scramblers”. Therefore, it is important to find a good pseudorandom number generator (PRNG) to act as a scrambler so that I can obtain well scrambled quasirandom sequences. Also, since one major application of scrambled quasirandom sequences is for parallelization, I will also need a good parallel PRNG. Please see Appendix A and B for more details. 1.5 Derandomization Scrambled sequences have good performance in practice [22]. Since scrambling produces a stochastic family of quasirandom sequences, searching and specifying optimal quasirandom sequences that achieve theoretically and empirically optimal results is an important problem for QMC. The process of finding such optimal quasirandom sequences is called derandomization. 7 It is not a new idea to find an optimal quasirandom sequence. Since poor two-dimensional projections were first found in the Halton sequence [1], Braaten and Weller [1] searched for an optimal Halton sequence by searching an optimal permutation from among all possible permutations. A similar example appears in searching for a good set of lattice points. Any set of good lattice points is completely determined by its generating vector (g1 , ..., gs ). Korobov [23] suggested considering a particular form such as (g1 , ..., gs ) = (1, a, a2 , ..., as−1 ) instead of all possible vectors, which turn out to be more efficient. Recently, FINDER [24, 25], a software package for computing financial derivatives by using quasirandom numbers, successfully used derandomization to find very good instances of the Soboĺ sequence and the generalized Faure sequence, GFaure. The derandomization of quasirandom numbers in FINDER was obtained empirically, not theoretically. Tezuka’s i-binomial scrambling [26] is a special case of GFaure, which gives us a specific search criterion and smaller space in which to search for an optimal GFaure sequence. In fact, there are very few theoretical or practical results for derandomizing quasirandom sequences. Thus, an important open question is how to provide a theoretical basis for derandomization. In any case, there has been some theoretical progress in derandomization. Faure [27] proved that it is possible to obtain a better star-discrepancy for the Halton sequence by using good permutations in one dimension. Also, Atanassov [28] proved that there exist digit permutations of scrambled Halton sequences which give better convergence rates in terms of discrepancy bounds than the original Halton sequences. These theoretical and practical results give us hope that the derandomization of quasirandom sequences will be generally possible, and will give us a way to improve the quality of quasirandom sequences used in QMC. An interesting point is that derandomization is not the only way to obtain better quasirandom sequences from a family, but is also a way to increase the convergence rate of QMC [29]. Before derandomization, I have to choose a scrambling space. Theoretically, Owen’s nested scrambling [11] is powerful, but because of too much bookkeeping in the various implementations of Owen’s original scrambling, modified and simplified Owen’s scrambling methods were explored [?, 30] and linear scrambling is considered to be suitable choices for scrambling methods. From the implementational point-of-view, linear scrambling is the simplest and most effective scrambling method to improve the quality of quasirandom 8 sequences. Therefore I will facus on the linear scrambling space to search for optimal quasirandom sequences. In this dissertation, I will illustrate the reasons of choosing linear scrambling and propose some new methods for derandomizing scrambled quasirandom sequences in linear scrambling space. More details about derandomization and linear scrambling can be found in Appendix C. 1.6 Applications QMC has been successfully applied to computer graphics [31], computational physics [32], computational finance [25], linear algebra [33], and Bayesian networks [34]. Although QMC is more accurate than MC, a disadvantage of QMC is that it is hard to use the Koksma-Hlawka inequality as a practical tool for computing error bounds. In fact, the common practice in MC, of using a predetermined error as termination criterion, is almost impossible to realize in QMC without extra technology. Scrambled quasirandom sequences are such a technology, as they allows us to obtain error estimates for QMC in a practical way. This is because I may think of each uniquely scrambled version of a given quasirandom sequence of prescribed quality as coming from a statistical distribution. One may then average estimates taken from these different scrambled sequences and combine the results statistically. In this way, one may recover confidence interval bounds that are common in MC. Besides obtaining error estimates, one of the important applications of scrambled quasirandom sequence is to produce parallel quasirandom numbers. Scrambled quasirandom sequences provide us with a family of similar quality quasirandom sequences, which in turn gives us a natural way to implement parallel quasirandom numbers. This is because one may assign a different scrambled sequence to each process requiring quasirandom numbers. 1.7 Paper Organization The remainder of this dissertation is organized as follows. In Chapter 2, I present a review of theoretical and practical measures for quasirandom sequences. These measures are widely used to judge the quality of a quasirandom sequence in practice. In Chapter 3, scrambled and optimally scrambled Halton sequences are presented. In this chapter, to study the phenomenon of poor two-dimensional projections, I propose a 9 particular scrambling method as a solution. The correlation coefficients between the different dimensions in the Halton sequence are obtained in general, and I show that the standard Halton sequence has poor two-dimensional projections. Scrambled and optimally scrambled Faure sequences are discussed in Chapter 4. A new algorithm for finding one or a set of optimal Faure sequences is proposed. Also an approach to optimize error estimation by using the optimal Faure sequences is studied. In Chapter 5, I analyze how the choices of initial direction numbers affect the quality of the Soboĺ sequence. Based on this analysis, a new algorithm for scrambling the Soboĺ sequences is studied. Implementation issues are addressed and finally an algorithm for finding optimal Soboĺ sequence is proposed. Chapter 6 reviews lattice point methods and their randomization. The advantages of lattice rules lie in their simplicity and their power as integration nodes for a wide class of periodic integrands. The scrambling methods [30, 19] used for lattice points can also be applied to the Halton, Faure and Soboĺ sequences. In Chapter 7, I consider applications of scrambled quasirandom sequences. I describe the layout of a parallel and distributed library, which includes automatic error estimation, parallel quasirandom sequences generated by the algorithms I proposed in this dissertation, and practical high-dimensional integral problems for use as testing benchmarks. Conclusions and future work are given in Chapter 8, the final section of the dissertation. 1.8 Contributions The contributions of this dissertation are listed below: 1. I give a detailed analysis for the correlations among dimensions in the standard Halton sequence. I analyze the result of them, and compute the correlation coefficients in section 3.2. Based on this analysis, I propose a new and simpler modified scrambling algorithm for the Halton sequence in section 3.3. A new algorithm for searching for this optimal Halton sequence is also proposed. This optimal Halton sequence is then numerically tested and shown empirically to be far superior to the original sequence in section 3.7. Two publications [35, 36] resulted from this works. One [36] is published and the other [35] is submitted. 10 2. I propose a new algorithm for obtaining optimal Faure sequences based on i-binomial scrambling in section 4.2. This algorithm is a natural extension of the algorithm above for finding optimal Halton sequences. I apply this optimal Faure sequence to evaluate a high-dimension integral in computational finance in section 4.3. Two papers [37, 38] based on this work were published by us. 3. I develop a new scrambling algorithm for the Soboĺ sequence, called the multi-digit scrambling algorithm in section 5.3. Most proposed scrambling methods randomize a single digit at a time. In contrast, my scheme randomizes many digits in a single point at a time, and is very efficient when using standard pseudorandom number generators as scrambler. One paper [39] has published. 4. Appendix B is a part of our paper [40] to appear Parallel Computing. 5. A review paper based on the previous version of this dissertation (my prospectus) has been submitted to SIAM Review. 11 CHAPTER 2 MEASURES OF IRREGULARITY This chapter presents a review of the most relevant literature related to measuring the uniformity of a quasirandom sequence. The uniformity of a sequence is theoretically measured via discrepancy. In addition, other measures, such as two-dimensional projections and high-dimensional integral problems, are often used to evaluate a quasirandom sequence in practice. The Halton, Faure and Soboĺ sequences are studied in this dissertation. Therefore, a review of these measures for these sequences are presented in this chapter. Bounds on the discrepancy of these sequences, as well as other analytical properties, have been presented in Niederreiter’s book [8]. Morokoff and Caflisch [2] have an excellent survey on quasirandom sequences and their discrepancy. 2.1 Theoretical Bounds on Discrepancy The Halton, Faure and Soboĺ sequences all have star discrepancy bounds of the same form, and N is the number of points and s is the number of dimensions. ∗ ≤ Cs DN (log N)s (log N)s−1 + O( ). N N Let CsH , CsF , CsS denote the Cs coefficient for the Halton, Faure and Soboĺ sequences respectively, then CsH = s pj − 1 , 2 log p j j=1 where the dimensional bases pj ’s are pairwise coprime. In practice, I always use the first s primes as the bases. For the Faure sequence, the coefficient can be written as CsF = 1 ps − 1 s ( ), s! 2 log ps 12 where ps is the base for the Faure sequence. It is known that the coefficient CsF has the desired property that lims→∞ CsF = 0. CsS = 2ts . s!(log 2)s The bound for ts is given in Soboĺ [13] as K s log s s log s ≤ ts ≤ + O(s log log s), log log s log 2 which shows that ts grows superexponentially with s, like in the Halton sequence. In addition, Faure’s paper [15] shows that CsF is smaller than both CsS and CsH . So the Faure sequence is best in the terms of asymptotic discrepancy bound of these sequences. However, this fact does not mean that the Faure sequence is superior to the Halton and Soboĺ sequences. In practice, an asymptotic discrepancy bound is not that useful to compare these sequences. First, the difficulty is due to the fact that the star-discrepancy is hard to calculate. There seem to be no effective and fast algorithms for computing star-discrepancy [21, 2]. The L2 -discrepancy, TN , is computationally more tractable than the star-discrepancy. ∗ ∗ ≤ TN . Two since DN In practice, the L2 -discrepancy is frequently used to replace DN algorithms [41, 42] are available for computing L2 -discrepancy. The fastest current algorithms for computing discrepancies are those for computing TN . Warnock [41] proposed an algorithm for TN with complexity O(sN 2 ). Heinrich [42] improved that complexity bound with a new algorithm to compute TN -discrepancy with complexity O(Bs N(log2 N)s ), where Bs is a constant that grows with s. Thus, Heinrich’s algorithm is efficient for small numbers of dimensions. The formula used to compute TN is as follows: N s N N s 1 1 2 (k) 2 (k) (k) (TN ) = s − (1 − (xi ) ) + 2 (1 − max(xi , xj )), s 3 N ∗ 2 i=1 k N i=1 j=1 k 2 (k) where xi (1) (2.5) (s) denotes the kth coordinate of xi , namely, xi = (xi , . . . , xi ). However, the calculated value of TN for quasirandom sequences may be misleading if the number of points, N, is relatively small. For small N, nearly the best possible TN value is obtained by a set all whose points are clustered near the corner (0,...,0). 13 Figure 2.1. L2 -discrepancy for 8 dimensional Soboĺ sequence. In addition, the constant CsF is part of an asymptotic discrepancy bound. In practice, the number of points used, N, might not be so large as to reach this asymptotic region for this bound. Also, other measures for the quality of these sequences should be considered, such as two-dimensional projections. Certain scrambling techniques do not affect the asymptotic discrepancy of these sequences [11]. Although scrambled quasirandom sequences improve the quality of quasirandom sequences, that improvement cannot be seen directly in the calculation of L2 discrepancy. Fig. (2.1) compares TN between the unscrambled Soboĺ sequence and the mean of 10 scrambled Soboĺ sequences. This picture does not reveal any advantage of scrambled quasirandom sequences with respect to discrepancy. 14 There has been some theoretical progress [28, 27] in optimal quasirandom sequences, which improve the theoretical bounds on discrepancy for one-dimensional Halton sequences. 2.2 2.2.1 Other Measures Orthogonal Projections Besides using TN to measure quasirandom sequences, orthogonal projections are another approach to check uniformity for high dimensional quasirandom sequence. In other words, L2 -discrepancy can be regarded as a measure of all dimensions of an s-dimensional quasirandom sequence, while orthogonal projections may be regarded as checking the quality of any µ-dimensional quasirandom sequence based on projections of a s-dimensional quasirandom sequence, where µ < s. However, it is hard to analyze more than three-dimensional projections, and two-dimensional projections are commonly used in practice. The idea behind this approach is simple: if a sequence is uniformly distributed in [0, 1)s , the any two-dimensional projection should also be uniformly distributed. However, even if a sequence has poor two-dimensional projections, it may still be fairly uniform in [0, 1)s . Poor projections is not a sufficient condition for a sequence to not be uniformly distributed in [0, 1)s . It is important to understand this potential problem when using a sequence with poor two-dimensional projections. Poor two-dimensional projections can be seen for any quasirandom sequence. From Fig. (2.2), two-dimensional Faure sequence has poor two-dimensional projections even from first dimension and second dimension. The Halton and Soboĺ sequences have the better performance in two-dimensional projections for first 5-dimensional sequences. The quality of the Soboĺ sequence heavily depends on the choices of its initial direction numbers. So it is important to be aware of the potential problems these quasirandom sequence may have. I note that scrambling quasirandom sequences definitely improves this behavior with respect to two-dimensional projections. 2.2.2 Practical Integral Problems High-dimensional integral problems are always a good way to test the quality of quasirandom sequences. A published set of test integrands [43], computational finance 15 1024 points of Halton sequence 1024 points of Faure sequence 1 0.8 0.8 Dimension 2 Dimension 2 1 0.6 0.6 0.4 0.4 0.2 0.2 0 0.2 0.4 0.6 Dimension 1 0.8 0 1 0.2 0.4 0.6 Dimension 1 0.8 1 Figure 2.2. Left figure: 1024 points of the Halton sequence; right figure: 1024 points of the Faure sequence. problems [44], approximate inference problems in Bayesian networks [45], Bayesian statistics problems [46], and finding weak repetitive pattern in bioinformatics [47] can be chosen to test the effectiveness of quasirandom sequences. Numerical methods are used for a variety of purposes in modern finance [44, 25]. These includes risk analysis, the valuation of securities, and the stress testing of portfolios. The Monte Carlo approach has proved to be a valuable computational tool in modern finance. However, for many applications in computational finance, the use of quasirandom sequences, seems to provide a faster rate of convergence than random sequences. Thus, the generation of appropriate high-quality quasirandom sequences is important to the quasi-Monte Carlo approach to many problems in computational finance [48]. Bayesian networks are gaining popularity as a modelling tool for complex problems involving reasoning under uncertainty. However, approximate inference to any desired precision has been shown to be an NP-hard problem [49]. Besides stochastic sampling methods [50], some deterministic algorithms have been proposed, such as system sampling [45, 51], and Latin hypercube sampling. Cheng and Druzdzel [34] investigated applications of quasi-Monte Carlo methods to Bayesian networks, and pointed out that approximate inference in Bayesian net16 works is an excellent test-bed for studying the properties of quasirandom sequences. Cheng and Druzdzel ’s experimental results show that quasirandom sequences significantly improve the performance of simulation algorithms in Bayesian networks compared to ordinary Monte Carlo methods. In this dissertation, I use the test integrals discussed in [43, 5] and an Asian option from computational finance to test my scrambled and optimal quasirandom sequences. 2.3 Conclusion The major goal of this chapter is to give a comprehensive review of the theoretical and practical measures used to judge the quality of quasirandom sequences. In summary, it is hard to show one of quasirandom sequences is superior to the others. Therefore, all of the Halton, Faure and Soboĺ sequences are used in practice for QMC applications. I have to consider all of them in this dissertation. The first use of RQMC is in error estimation. Thus, for this use, one needs a fast scrambling algorithm for all the sequences obtained [11, 48]. Although scrambling [29] does not change the theoretical bounds on discrepancy of these sequences, scrambling methods do improve the measures of two-dimensional projections and evaluation of high-dimensional integrals. In addition, theoretically it is impossible to prove that one of scrambled quasirandom sequences has better performance than the others so far. Therefore, in the following three Chapters, new algorithms for scrambling and finding optimal sequences are proposed. Measures of two-dimensional projections and practical integral problems are used to assess the scrambled sequences. In this dissertation, I focus on proposing effective and fast scrambling algorithms for widely used sequences: the Halton, Faure, and Soboĺ sequences. 17 CHAPTER 3 THE SCRAMBLED HALTON SEQUENCE The Halton sequence is one of the standard (along with (t, s)-sequences and lattice points) low-discrepancy sequences, and thus is widely used in QMC applications. However, the original Halton sequence suffers from correlations between radical inverse functions with different bases used for different dimensions. These correlations result in poorly distributed two-dimensional projections. A standard solution to this is to use a randomized (scrambled) version of the Halton sequence. Here, I analyze the correlations in the standard Halton sequence, and based on this analysis propose a new and simpler modified scrambling algorithm in this chapter. 3.1 The Halton Sequence A classical family of low-discrepancy sequences is the Halton sequence [12]. One of its important advantages is that the Halton sequence is easy to implement due to its definition via the radical inverse function. φp (n) ≡ bm b0 b1 + 2 + ... + m+1 , p p p (3.1) where p is a prime number, and the p-ary expansion of n is given as n = b0 + b1 p + ... + bm pm , with integers 0 ≤ bj < p. The Halton sequence, Xn , in s-dimensions is then defined as Xn = (φp1 (n), φp2 (n), ..., φps (n)), (3.2) where the dimensional bases p1 , p2 , ..., ps are pairwise coprime. In practice, I always use the first s primes as the bases. In comparison to other low-discrepancy sequences, the Halton sequence is much easier to implement due to the ease of implementation of the radical inverse function. 18 The radical inverse function simply reverses the digit expansion of n, and places it to the right of the “decimal” point. Moreover, moving from φp (n) to φp (n + 1) can be implemented with rightward-carry addition of 1/p, and thus is very efficiently implemented. However, a problem with the Halton sequence arises from correlations between the radical inverse functions for different dimensions. These correlations cause the Halton sequence to have poor two-dimensional projections for some pairs of dimensions. For example, the two-dimensional projections of the 7th and 8th [1], 28th and 29th [2], and 39th and 40th [3] dimensions are very poor. Fig. 3.1 illustrates the poor projections in these cases. In addition, Section 3.3 has a more detailed analysis of these correlations. The poor two-dimensional projections are caused by the fact that the difference between the two primes bases corresponding to the different dimensions is very small relative to the base size. Fox [5] coded the first 40-dimension Halton sequence, and the first 40 primes were used for the bases. Among these 40 primes, there are eight pairs of twin primes greater than 10. The list is as follows: (11,13), (17,19), (41,43), (59,61), (71,73), (101,103), (107,109), (149,151). All of them have poor two-dimensional projections. The following anaysis is based on my published paper [36]. To study this phenomenon, consider one base, p and another base, p + α, where the difference, α, can be thought of as being relatively small. Let n be a positive integer, then n = a0 + a1 (p + α) + ... + am (p + α)m . The formula for φp (n) is given in equation (3.1), and the formula for φp+α(n) is : a0 a1 am + + ... + p + α (p + α)2 (p + α)m+1 a1 am a0 + 2 + ... + . = p(1 + α/p) p (1 + α/p)2 pm+1 (1 + α/p)m+1 φp+α (n) = (3.3) From Equations (3.1) and (3.3), I can see that correlation between φp (n) and φp+α (n) is due to the fact that when α is small compared to p, then (1 + αp ) is close to 1. Thus one would expect the worst problems when α is the smallest possible, say α = 2, the case of twin primes for p and p + α. However, good two-dimensional projections for the Halton sequence may be obtained if the number of points is equal to the product of the bases. This is due to the fact that the least significant digit in the p-ary expansion of n is b0 , and so it repeats every p; similarly, a0 repeats every p + α. Since the uniformity is dictated most by this digit, I should get a uniform two-dimensional projection by using p(p + α) points. According to this reasoning, 19 100 points of Halton sequence 1 0.8 Dimension 8, p=19 Dimension 28, p=109 0.8 0.6 0.6 0.4 0.4 0.2 0 1 0.2 0.2 0.4 0.6 Dimension 7, p=17 0 0.8 2000 points of Halton sequence 0.2 0.4 0.6 0.8 Dimension 27, p=107 1 512 points of Random-Start Halton sequence 1 0.8 Dimension 13, p=43 Dimension 40, p=173 0.8 0.6 0.6 0.4 0.4 0.2 0 4096 points of Halton sequence 0.2 0.2 0.4 0.6 0.8 Dimension 39, p=167 0 1 0.2 0.4 0.6 0.8 Dimension 13, p=41 1 Figure 3.1. Poor 2-D projections were studied in several papers. For example, left top: was included in Braaten’s paper [1], right top: in Morokoff’s paper [2], left bottom: in Kocis’s paper [3], right bottom: random-Start sequence [4]. the Halton sequence should have good two-dimensional projection if I choose the number of points to be the product of bases. For example, if I plan to use the 39th and 40th dimension of the Halton sequence, the bases are 167 and 173 respectively, and 167 × 173 = 28891 points should be well distributed in these two dimensions. 20 3.2 Correlations The following anaysis is based on my submitted paper [35]. The original Halton sequence suffers from correlations between radical inverse functions with different bases used for different dimensions. These correlations result in poorly distributed two-dimensional projections among other things. In this section, I will calculate the correlation coefficient between two radical inverse functions, φp (n) and φp+α(n). The calculations will provide some insight into the correlations between dimensions in the Halton sequence and show that the original Halton sequence is weak. Based on this analysis, an effective scrambling algorithm will be proposed in the next section. In Fig. 3.1, I can see similarities in the poor two-dimensional projections. For example there are two clusters of lines parallel to the line y = x. A more careful analysis would reveal that the number of parallel lines in each cluster is almost equal to the ceiling of the number of points divided by the prime base. At the end of this section, I will give explanations for the above observations. The main point of this section is to compute the correlation coefficient between φp (n) and φp+α(n). I have the p-ary and the p + α-ary expansion of n given as n ≡ b0 + b1 p + ... + bm pm = a0 + a1 (p + α) + ... + am (p + α)m . (3.4) Let us consider only the first two most significant digits, i.e., m = 1. Then after truncating at m = 1, I obtain the following relation from equation (3.4): b1 = a1 + ⌊ ap0 ⌋ b0 = a0 + αa1 (mod p). (3.5) The period of both a0 and b0 is p ∗ (p + α), so I only consider the range of n between 1 to p ∗ (p + α). Therefore, φp+α (n) can be expressed as follows: a0 a1 φp+α(n) = + + O(p−2) 2 p + α (p + α) (3.6) By combining equations (3.1) and (3.5), φp (n) can thus be expressed in terms of the (p + α)-ary expansion of n as: a0 + αa1 (mod p) a1 + ⌊ ap0 ⌋ + + O(p−2). (3.7) p p2 For 1 ≤ n ≤ p(p+α), I partition this interval to p+α parts, namely kp+1 ≤ n ≤ (k +1)p−1 φp (n) = for k = 0, 1, 2, . . . , p + α − 1. Then I calculate Rk , the correlation coefficient between φp (n) 21 and φp+α(n) with kp ≤ n < (k + 1)p. However, Rk can be obtained from R0 , the correlation coefficient between φp (n) and φp+α (n) with 1 ≤ n < p − 1. This is due to the fact that the second most significant digits, a1 and b1 , will not change until the most significant digits have cycled. Thus, the correlation between φp (n) and φp+α (n) is primarily based on the correlation of their most significant digits, b0 and a0 . Therefore, computing the correlation coefficient between φp (n) and φp+α(n) with kp ≤ n < (k + 1)p translates into computing the correlation coefficient between b0 p+α and b0 +αb1 p with 1 ≤ n < p. I now define the formula for the correlation coefficient, R, between any two sequences {xi }1≤i≤N and {yi }1≤i≤N . Let x̄ and ȳ denote the average of the two sequences respectively, then the formula for the correlation coefficient is defined as where Sxy = (xi − x̄)(yi − ȳ), Sxx i and yi p+α 1 1 + O( p+α ) and ȳ 2 In my case xi = is small, x̄ = Sxy R= , Sxx Syy = (xi − x̄)2 , and Syy = (yi − ȳ)2 . (3.8) = pi , and I take those i’s such that 1 ≤ i ≤ p − 1. Thus as α = 21 . Then pieces of the correlation coefficient between φp (n) and φp+α(n), for n = 1, ..., p − 1, can be calculated by the following formula: Sxy ≈ p−1 i=1 2 ( i 1 i 1 − )( − ) p+α 2 p 2 p + (4 − 3α) 12(p + α) α p + O( ). = 12 p = (3.9) Then Sxx and Syy can be calculated as follows: Sxx Syy p−1 p 1 i 1 i − )2 ( = ( − )2 p + α 2 i=0 p 2 i=0 p(p2 + 3p + 2 − α(1 − α)) p2 + 2 )( ) 12(p + α)2 12p p α2 = ( )2 + O( ). 12 (p + α)2 = ( (3.10) p 2 ) . Let R0 Using the same assumption above, I can approximate Equation (3.10) by ( 12 denote the correlation coefficient between φp (n) and φp+α(n), for n = 1, 2, ..p − 1, then p p (3.11) R0 ≈ / ( )2 = 1. 12 12 22 One can calculate that Rk ≈ 1 for kp + 1 ≤ n ≤ (k + 1)p − 1 with b1 = k. This explains why the poor two-dimensional projections of the Halton sequence in Fig. 3.1 look like lines parallel to the line y = x, since all pairs of points approximately fall on the lines y = Rk x + c with c = b1 or c = a1 . This is based on a common interpretation of the correlation coefficient [52]. Now, let us compute the number of parallel lines seen for these poor projections of the Halton sequence. Every time, b1 or a1 changes, the line y = Rk x + c wraps. For n points, b1 n will change ⌈ np ⌉ times and a1 will change ⌈ p+α ⌉. Thus, the total number of lines for any n points may be computed as n n ⌉. ⌈ ⌉+⌈ p p+α 3.3 Methods to Break Correlations There are at least two possible ways to break the correlations I have seen in the Halton sequence. One is by increasing the difference between the bases for any pair of dimensions; the other is to scramble the Halton sequence. The first method is only useful when the number of dimensions is small. When p is large, p + α has to be much larger if I want to break the correlations. Let α = ep where e > 0, and then from equations (3.9) and (3.10), I see that the correlation coefficient will be approximately 1 . e+1 Thus I can increase e until the correlation coefficient is sufficiently small. However, e = p is normally considered to be sufficient to ensure small correlation. This implies prime pairs of the form p and p+p2 , but increasing α also raises a problem in the upper bound of the star-discrepancy for the Halton sequence. For N points in the s-dimensional Halton sequence, the upper bound of the star-discrepancy satisfies the inequality: ∗ DN ≤ C(p1 , .., ps ) where C(p1 , .., ps ) ≈ increase as pj logpj pj −1 j=1 2logpj . s (logN)s−1 (logN)s + O( ), N N With pj increasing, this constant in the upper bound will is an increasing function of pj . 23 The other method to break the correlations is to scramble the Halton sequence. The first 4-dimensions of the Halton sequence gives us a hint for obtaining better quality high-dimensional sequences. If one can reorder or shuffle the digits in each point of the Halton sequence for different dimensions, the correlations between different dimensions can be made very small. This is due to the fact that there are gaps between the most significant digits of φ2 (n), φ3 (n), φ5 (n), and φ7 (n), which have good two-dimensional projections with p < 10. However, when p > 10, there are no gaps for the most significant base 10 digits of φp (n) and φp+α (n). From Fig. 3.2, it is easy to see that the most significant digits for φ17 (n) and φ19 (n) go from 1 to 9 without jumps. However, the most significant digits for φ5 (n) and φ7 (n) jump. φ17 (n) 0.117647 0.176471 0.235294 0.294118 0.352941 0.411765 0.470588 0.529412 0.588235 0.647059 0.705882 0.764706 0.823529 0.882353 0.941176 φ19 (n) φ5 (n) 0.105263 0.157895 0.210526 0.263158 0.315789 0.368421 0.421053 0.473684 0.526316 0.578947 0.631579 0.684211 0.736842 0.789474 0.842105 0.400000 0.600000 0.800000 0.040000 0.240000 0.440000 0.640000 0.840000 0.080000 0.280000 φ7 (n) 0.285714 0.428571 0.571429 0.714286 0.857143 0.020408 0.163265 0.306122 0.448980 0.591837 Fig. 3.2: A list of φ17 (n) and φ19 (n) with 2 ≤ n ≤ 16, and φ5 (n) and φ7 (n) with 2 ≤ n ≤ 11. Scrambling the Halton sequence can break these cycles, and the correlations created with the radical inverse function. It is clear that the correlations seen are due to the shadowing of the most significant digits of the Halton sequence. 24 3.4 A Scrambled Halton Sequence The Halton sequence uses different primes as bases for the inverse radical functions for different dimensions, and suffers from the correlation between the radical inverse functions. In order to improve this situation, many scrambling procedures have been proposed. A less elaborate, but easier to implement scrambling technique was proposed by Morokoff and Caflisch [2]: after obtaining N elements of the s-dimension Halton sequence, permute this block. This procedure maintains low-discrepancy and gives good two-dimensional projections. Morokoff and Caflisch scramble the Halton sequence independently in each dimension: if N points in [0, 1)s are required, then s sequences of N random numbers are generated and sorted from the smallest to the largest. The mapping of the original position in the sequence to final position is then used to permute the Halton sequence. This method is called dimensional permutation. The other proposed scrambling methods are based on digital permutations. A digital permutation of the Halton sequence is defined as φp (n; π) ≡ π0 (a0 ) π1 (a1 ) πm (am ) + + ... + m+1 , 2 p p p (3.12) where π(.) is a permutation of the integers 1,2,3,..., p − 1. Several digital permutation methods have been proposed: • Braaten and Weller[1] improved the Halton sequence by picking permutation πp (.) that minimizes the one dimensional discrepancy of the set { πpp(1) , . . . , πpp(j) , πp (j+1) }. p This procedure does not specify a unique permutation, so a permutation table up to dimension 16 is given. • Tuffin [19] extended Braaten and Weller’s work and created permutations for high dimensions. • Kocis and Whiten[3] propose two methods for improving the Halton sequence. They first proposed an algorithm for modifying Braaten-Weller’s permutation function and secondly considered leaped Halton sequences. – Kocis and Whiten developed modified permutations π(ai (j, n)) of the Halton sequence by reverse permuting the ai (j, n) in base two, and removing any values 25 that are too large. In this way, the cycles (correlation) of the Halton sequence are broken. – Leaped Halton sequences use only every Lth Halton number subject to the condition that L is a prime different from all bases. Compared to digital permutations, dimensional permutations of the Halton sequence are much less computationally costly, since each point in the s-dimensional version of Morokoff’s scrambling method needs s permutations. Meanwhile, digital permutations of the Halton sequence need at least sm different digital permutations, where m is the number of digits scrambled. Yet another scrambling approach is that of Wang and Hickernell [4] called the randomstart Halton sequence, which is the original Halton sequence but started at different random integers ni for each dimension: Xn = (φp1 (n + n1 ), φp2 (n + n2 ), ..., φps (n + ns )). (3.13) This method does not break the two-dimensional correlations. The correlation coefficient is not changed by random starting, as the period of the radical inverse function is unchanged. In my method, I will modify Morokoff’s procedure, by permuting every other dimension. Considering any pair of dimensions, as long as one of them breaks the cycles, the correlation between the two dimensions will not be significant. 3.5 Implementation Issues Based on the analysis of the correlation coefficient between any pair of dimensions, the method of dimensional permutation is preferable because it is simpler and faster while maintaining its effectiveness. One implementation of the Halton sequence [53, 5] uses the formula xn+1 = xn ⊕p 1p , where ⊕p is defined as rightward carry addition1 . This algorithm provides a fast generation of the Halton sequence as rightward carry addition steps from n to n + 1 in the radical inverse function. Digital permutation, however, cannot use this algorithm for generating a scrambled Halton sequence. 1 Normal addition is leftward carry. 26 I thus seek a method of scrambling Halton sequences without sacrificing the speed of generating them. One drawback of dimensional permutation is that I have to know the total number of points before I can scramble the sequence. In order to take advantage of Fox and Halton’s generation algorithm [53, 5] and overcome the disadvantage of dimensional permutation, I can permute the sequence according to each dimension over certain period (p or p2 ). The period of the most significant digits for each point in each dimension is its base (p). Permuting the most significant digit of each point is the same as permuting each p-long block in {φp (1), φp (2), . . . , φp (p − 1)}, {φp(p), φp (p + 1), . . . , φp (2p − 1)}, . . . . The advantage of my procedure is that I do not need to know the total number of the Halton points in advance. In addition, omitting the first few Halton points in practice leads to a smaller discrepancy. 3.6 Linear Scrambling Linear scrambling is the simplest and most effective scrambling method to break this correlation. This is the reason why we focus on linear scrambling and try to look for the “best” in the linear space in the next section. Many scrambling methods [1, 21, 4, 41] have been proposed for the Halton sequence to break such correlation between dimensions. Most of them are based on digital permutation, and its definition is as follows: φπpi (n) ≡ πpi (b0 ) πpi (b1 ) πpi (bm ) , + ... + + pi p2i pm+1 i (3.14) where πpi is a permutation of the set {0, 1, 2, 3, ..., pi − 1}. Before we start to search for the optimal Halton sequence, we must decide which permutation functions can be chosen for πpi . In other words, we are trying to find a function f (x) from many permutations of a given form, x ∈ {0, 1, 2, 3, ..., pi − 1}, such that f (x) is a permutation of the set {0, 1, 2, 3, ..., pi − 1}. There are two simple functions, which conveniently defined a subset of the pi ! permutations [54]: one is f (x) = wx + c (mod pi ), and it is the “best” in some sense among a subset of the p! possible permutations. The other is f (x) = xk (mod pi ), and gcd(k, pi − 1) = 1. From the implementational point-of-view, the linear scrambling, f (x) = wx + c (mod pi ), is quite effective in comparison to other scrambling methods. 27 We choose the linear function, f (x) = wx + c (mod pi ), with c = 0 as our πpi to scramble the Halton sequence. πpi (bj ) = wi bj (mod pi ), (3.15) where 1 ≤ wi ≤ pi −1 and 0 ≤ j ≤ s. The reason for considering c = 0 is that we want not to permute zero. The idea of not permuting zero is to keep the sequence unbiased. Permuting zero (assuming an infinite string of trailing zeros) leads to a biased sequence. This linear scrambling gives us a stochastic family of the scrambled Halton sequences, which includes (p1 − 1)(p2 − 1) . . . (ps−1 − 1)(ps − 1) sequences for the s-dimensional Halton sequence. The main goal of this paper is to find an optimal sequence from this scrambled family. The algorithm for finding the optimal sequence is described in the next section. 3.7 Optimal Halton Sequences In this section, I search for the optimal Halton sequence in the linear scrambling space with cj = 0. Therefore, my goal focuses on searching for the best wi for the linear congruential generator πpi (bj ) = wi bj (mod pi ) to find the best permutation on the set bj ∈ {0, 1, 2, . . . , pi − 1}. There are several theoretical procedures to make this assessment: the spectral test and discrepancy are commonly used criteria. Since the modulus is small in my case, the (2) spectral test [55] is not suitable. Instead I consider using the L2 -discrepancy, DN . For a (2) prime modulus, p, and a primitive root, W , modulo p as multiplier, the discrepancy, DN , satisfies [56] (p − (2) 1)Dp−1 ≤ 2+ q ai , (3.16) i=1 where ai is the ith digit in the continued fraction expansion of W p with aq = 1. My job is now reduced to finding a primitive root Wp modulo p such that Wp has the smallest sum of continued fraction expansion digits with Wp p = [a1 , a2 , ..., aq ] and aq = 1. In Table 3.1, I list the results of my search for the best primitive root modulo p for the first 40 dimensions of the Halton sequence. Wp is the best primitive root modulo p based on this criterion, and p is the prime for the base at dimension s. 28 Table 3.1. Optimal values for Wp for the first 40 dimensions of the Halton Sequence s p Wp s p Wp s p Wp s 1 2 3 4 5 6 7 8 9 10 2 3 5 7 11 13 17 19 23 29 1 2 3 3 8 11 12 14 7 18 11 12 13 14 15 16 17 18 19 20 31 37 41 43 47 53 59 61 67 71 12 13 17 18 29 14 18 43 41 44 21 22 23 24 25 26 27 28 29 30 73 79 83 89 97 101 103 107 109 113 40 30 47 65 71 28 40 60 79 89 31 32 33 34 35 36 37 38 39 40 Table 3.2. Estimates of the Integral sequences 1 0 ... 1 0 p Wp 127 56 131 50 137 52 139 61 149 108 151 56 157 66 163 63 167 60 173 66 Πsi=1 |4xi − 2|dx1 . . . dxs = 1 by using Halton Generators N s = 13 s = 20 s = 25 Halton DHalton 1000 1000 1.171 0.875 2.324 0.601 34.513 0.612 681382.379 0.311 Halton DHalton 2000 2000 1.091 0.922 1.444 0.952 17.450 0.846 340691.207 0.255 Halton DHalton 3000 3000 1.091 0.908 1.362 0.869 12.178 0.769 227127.541 0.515 Halton DHalton 5000 5000 0.978 0.942 1.140 0.985 7.811 1.979 136276.627 0.419 Halton DHalton 7000 7000 0.922 0.942 0.998 1.216 5.782 1.742 97340.706 0.489 Halton DHalton 30000 30000 0.979 0.988 0.888 1.097 2.13 1.171 22713.137 1.276 Halton DHalton 40000 40000 0.974 1.014 0.889 1.118 1.796 1.381 17035.076 1.118 Halton DHalton 50000 50000 0.984 1.006 0.903 1.116 1.568 1.289 13628.735 1.034 29 s = 40 To empirically verify optimality, I evaluate the test integral discussed in [43, 5], namely, 1 1 |4xi − 2| + ai ... Πsi=1 dx1 . . . dxs = 1. (3.17) 1 + ai 0 0 The accuracy of quasi-Monte Carlo integration depends not simply on the dimension of the integrands, but on their effective dimension. The test function in Equation (3.17) is among the most difficult cases for high-dimensional numerical integration. I have estimated the values of these test integrals in dimension 20 < s < 40, with a1 = a2 = · · · = as = 0. In this integral, all the variables are equally important, and Wang and Fang [57] calculated the effective dimension is approximately the same as the real dimension of the integrand. Thus, I may expect improvements from quasi-Monte Carlo by using the derandomized sequences. The result for a1 = a2 = · · · = as = 0, is listed in Table 3.2. As shown in [5], the errors of the numerical results for over 20 dimensions become quite large. However, after derandomization, I found that the integral is reasonably well approximated in dimensions over 20. In Table 3.2, the label Halton refers to the original Halton sequence provided by Fox [5], while DHalton refers to my derandomized Halton sequence. 3.8 Conclusion A major problem with the Halton sequence comes from the correlations between the radical inverse functions for different dimensions. The significance of these correlations becomes apparent for medium and large dimension. Scrambling can improve the performance of the Halton sequence. I provide a number theoretic criterion to choose the optimal scrambling from among a large family of random scramblings. Based on this criterion, I have found the optimal scrambling for up to 60 dimensions for the Halton sequence. This derandomized Halton sequence is then numerically tested and shown empirically to be far superior to the original sequence In this Chapter, I gave a review of the Halton sequence and scrambled Halton sequence and explored the reasons for the poor two-dimensional projections of the Halton sequence. Various scrambling methods were studied and compared based my quantitative analysis. In practice, effective scrambling methods for the Halton sequence were presented. I presented a new algorithm for searching for an optimal Halton sequence. This was shown to be very 30 important for practical quasi-Monte Carlo applications through the example of a difficult high-dimensional integral. Even though it is well-known that the distribution of the Halton sequence in high dimensions is not good, scrambling or optimally scrambling the Halton sequence can often improve the quality. Thus the scrambled and optimal Halton sequence can be widely applied in quasi-Monte Carlo applications. 31 CHAPTER 4 THE SCRAMBLED AND OPTIMAL FAURE SEQUENCE The Faure sequence is one of the most widely used quasirandom sequences in QMC. I summarize aspects of scrambling techniques for the Faure sequence and present a modified scrambling algorithm. In additon, I propose a new efficient algorithm for finding optimal Faure sequences, and use the optimal Faure sequence to evaluate a particular derivative security from computational finance. Numerical results show that this optimal sequence give promising results even for high dimensions. 4.1 The Scrambled Faure Sequence The original construction of quasirandom sequences was related to the van der Corput sequence, which is a one-dimension quasirandom sequence based on digital inversion. This digital inversion method is a central idea behind the construction of many current quasirandom sequences in arbitrary bases and dimensions. Following the construction of the van der Corput sequence, a significant generalization of this method was proposed by Faure [15] to the sequences that now bear his name. Later, Tezuka [7] proposed the generalized Faure sequence, GFaure, which forms a family of randomized Faure sequences. The Faure sequence is based on the radical inverse function, φb (n), and a generator matrix, C. Let b ≥ 2 be prime, and n = (n0 , n1 , ..., nm−1 )T be an integer vector with its elements the b-adic expansion of the integer n. Then the radical inverse function, φb (n), is defined as φb (n) = nm−1 n0 n1 + 2 + ... + m . b b b The Faure sequence defines a different generator matrix for each dimension. The generator matrix of the jth dimension for an s-dimensional Faure sequence is denoted as C (j) = P j−1 32 4096 Optimal Faure points 4096 Faure points 0.8 0.8 Dimension 100 1 Dimension 100 1 0.6 0.6 0.4 0.4 0.2 0 0.2 0.2 0.4 0.6 Dimension 99 0.8 0 1 0.2 0.4 0.6 Dimension 99 0.8 1 Figure 4.1. Left: The original Faure sequence, right: an optimal Faure sequence for (1 ≤ j ≤ s), where P , the Pascal matrix, is defined as follows: r−1 j−1 = (j − 1)(r−k) (mod b), k ≥ 1, r ≥ 1. P k−1 (4.1) (1) (2) (s) Above k is the row index, and r is the column index. Thus let xn = (xn , xn , ..., xn ) be (j) the nth Faure point, then xn , can be represented as follows: (j) x(j) n = φb (C n), (4.2) and so (φb (P 0 n), φb (P 1 n), ..., φb (P s−1n)) gives the s-dimensional Faure sequence. 4.1.1 Generalized Faure (GFaure) Sequences Tezuka’s GFaure has the jth dimension generator matrix as C (j) = A(j) P j−1, where A(j) is a random nonsingular lower triangular matrix and can be expressed as follows: 33 A(j) h11 0 0 g21 h22 0 g31 g32 h33 = · · · · · · · · · 0 0 0 · · · ... ... ... ... ... ... 0 0 0 , . . . m×m where hii is uniformly distributed on the set {1, 2, ..., b − 1}, gij is uniform on the set {0, 1, 2, ..., b−1}, and m is the number of digits to be scrambled. Thus GFaure is a stochastic family of the Faure sequence, and this family has as many as b m2 2 different sequences. An interesting problem is finding one or a subset of optimal Faure sequence within such a large family. 4.1.2 I-binomial Scrambling Tezuka [58] proposed an algorithm to reduce the number of sequences in this GFaure family while maintaining the original quality of Faure sequence. A subset of GFaure is called “GFaure with the i-binomial property” h1 g2 g3 (j) A = g4 · · · [58], with A(j) defined to be Toeplitz: 0 0 0 ... 0 h1 0 0 ... 0 g2 h1 0 ... 0 g3 g2 h1 ... 0 , · · · ... . · · · ... . · · · ... . m×m (4.3) where h1 is uniformly distributed on the set {1, 2, ..., b − 1}, and gi , 2 ≤ i ≤ m, is uniformly on the set {0, 1, 2, ..., b − 1}. For each A(j) , there will be a different random matrix in the above form. 2 /2 I-binomial scrambling reduces the scrambling space from O(bm ) to O(bm ). This reduction makes searching for the optimal Faure sequence computationally tractable. Based on i-binomial scrambling, Owen [20] proposed another form for A(j) : 34 A(j) h1 0 0 h1 h2 0 h1 h2 h3 = · · · · · · · · · 0 0 0 · · · ... ... ... ... ... ... 0 0 0 . . . . m×m (4.4) The matrix in (4.4) for A(j) makes the same reduction as the matrix in (4.3) from GFaure. However, this form is not suitable for my algorithm to find an optimal sequence within the GFaure family. Following the lead of i-binomial scrambling proposed by Tezuka [58], I try to find an optimal Faure sequence from a relatively smaller space, rather than the whole GFaure. 4.2 The Optimal Faure Sequence In this section, I provide a number theoretic criterion to choose an optimal scrambling from among a large family of possible (random) scramblings of the Faure sequence. Based on this criterion, I have found the optimal scramblings for any dimension. This derandomized Faure sequence is then numerically tested and shown empirically to be far superior to the original unscrambled sequence. There have been various scrambling methods proposed for the Faure sequence to obtain better uniformity for quasirandom sequences in high dimensions. Among these scrambling algorithms, the simplest and most effective is linear matrix scrambling [20]. GFaure and i-binomial scrambling are good examples of linear matrix scrambling. In the rest of this section, my algorithm for searching for an optimal Faure sequence within GFaure with the i-binomial property is described. The diagonal element, h1 , of A(j) in (4.3) scrambles all digits of each original Faure point. The element g2 scrambles all but first digit of that Faure point. Most importantly, the two-most significant digits of the Faure point are only scrambled by h1 and g2 . Thus the choice of these two elements is crucial for producing optimally scrambled Faure sequences. I focus on finding the best and simplest values for h1 and g2 so that an optimal Faure sequence can be obtained. This reduces the search to a space of size b(b − 1). In my example, I consider a simple form for A(j) within i-binomial scrambling: 35 A(j) j−1 h1 0 0 0 g2 h1j−1 0 0 j−1 0 g2 h1 0 j−1 = 0 g2 h1 0 · · · · · · · · · · · · ... ... ... ... ... ... ... 0 0 0 . 0 . . . m×m (4.5) The idea behind this reducation of matrix is becuase the error for reducation is at most 1 . b2 In another words, the maximum difference between this scrambled sequence by using matrix A(j) in (4.5) and one completely scrambled with matrix A(j) in (4.3) is 1 . b2 For example, b = 53 for a 50 dimensional Faure sequence with two digits scrambled gives an error at most 1 532 ≈ 0.00035. Hence, the issue of the reducation of matrix A(j) becomes much less of a concern for high dimensional Faure sequence. Thus, I search for the optimal Faure sequence in this reduced i-binomial scrambling space. My goal focuses on searching for the best h1 as the multiplier in a linear congruential generator, πp (nj ) = h1 nj (mod b), so that I can find the best permutation on the set nj ∈ {1, 2, . . . , b − 1}. There are several theoretical procedures to make this assessment, and the spectral test and discrepancy are commonly used criteria. Assume that b is small, and thus the spectral (2) test [55] is not suitable. Instead consider using the L2 -discrepancy, DN . For a prime modulus (2) b, and a primitive root h1 modulo b as multiplier, I have that the discrepancy, DN , of the associated linear congruential generator satisfies [56] (2) (b − 1)Db−1 ≤ 2 + q ai , (4.6) i=1 where ai is the ith digit in the continued fraction expansion of h1 b with aq = 1. my job is reduced to finding a primitive root h1 modulo b such that h1 has the smallest sum of continued fraction expansion digits with h1 b = [a1 , a2 , ..., aq ] and aq = 1. A table for the best primitive root modulo b based on this criterion is listed in [36]. Then g2 is chosen to be a primitive root modulo b such that g1 has the second smallest sum of continued fraction expansion g2 b = [a1 , a2 , ..., aq ] and aq = 1. In Figure 4.1, the right figure is an optimal Faure sequence with h1 = 28 and g2 = 83, and the left figure is the original Faure sequence. 36 In addition, error estimation can be obtained by using several scrambled optimal Faure sequences, and each scrambled optimal Faure sequence can be produced by assigning gi , for 3 ≤ i ≤ m, in A(j) in (4.5) to numbers randomly chosen from the set {0, 1, 2, .., b − 1}. 4.3 Geometric Asian Options In this section, I examine the valuation of a complex option for which there is a simple analytical solution. The popular example for such problems is a European call option on the geometric mean of several assets, sometimes called a geometric Asian option. Let K be the strike price at the maturity date, T . Then the geometric mean of N assets is defined as N 1 G = ( Si ) N , i=1 where Si is the ith asset price. Thus the payoff of this call option at maturity can be expressed as max(0, G − K). Boyle [59] proposed an analytical solution for the price of a geometric Asian option. The basic idea is that the product of lognormally distributed variables is also lognormally distributed. This is due to the fact that the behavior of an asset price, Si , follows geometric Brownian motion [60]. The formula for using the Black-Scholes equation [61, 60] to evaluate a European call option can be represented by: CT = S ∗ Norm(d1 ) − K ∗ e−r(T −t) ∗ Norm(d2 ), ln(S/K) + (r + σ 2 )(T − t) √ , with d1 = σ T −t √ d2 = d1 − σ T − t, (4.7) where t is the current time, and r is risk-free rate of interest that is constant in the Black-Scholes world. Norm(x) is the cumulative normal distribution. Since there exists an analytical solution for a geometric Asian option, this offers us a benchmark to compare my simulation results. The parameters used for my numerical studies are listed in Table 4.1. 37 Table 4.1. Parameters Used for Numerical Studies Number of assets Initial asset prices, Si (0) Volatilities, σi Correlations, ρij Strike price, K Risk-free rate, r Time to maturity, T N 100, for i = 1, 2, ..., N 0.3 0.5, for i < j 100 10% 1 year Table 4.2. Pricing Geometric Asian Options Using Parameters in Table 3.1 N K Analytic Solution 3 100 50 100 13.771 12.223 The formula to compute the analytic solution for a geometric Asian option is computed by a modified Black-Scholes formula. Using the Black-Scholes formula, the call price can be computed by equation (4.8) with the modified parameters, S and σ 2 , as follows: S = Ge(−A/2+σ N 1 2 σ A = N i=1 i σ2 = 2 /2)T N N 1 ρij σi σj . N 2 i=1 i=j I follow the above formula and compute the prices for different values of N and list the results in Table 4.2. 4.4 Numerical Results For each simulation, I have an analytical solution, so I compute the relative error between that and my simulated solution with the formula |pqmc − p| , p where p is the analytical solution in Table 3.1 and pqmc is the price obtained by simulation. For different N, the pqmc is obtained by simulating the asset price fluctuations using geometric 38 Figure 4.2. Left figure: geometric mean of 3 stock prices; right figure: geometric mean of 50 stock prices. Here the label “Faure” refers to the original Faure sequence [5], while “dFaure” refers to my optimal Faure sequence. Brownian motion. The results are shown in Figure 4.2, where the label “Faure” refers to the original Faure sequence [5], while “dFaure” refers to my optimal Faure sequence. From equation (4.8), I can see that I have to use random variables sampled from a normal distribution. Each Faure point must be transformed into a normal variable. The favored transformation method for quasirandom numbers is the inverse of the cumulative normal distribution function. The inverse normal function provided by Moro [62] is used in my numerical studies. From Figure (4.2), it is easily seen that the optimal Faure sequence and the original Faure sequence have the same performance when the number of dimensions is low 3. However, when the number of dimensions increases to 50, the optimal Faure sequence has better performance than the original Faure sequence. 39 4.5 Conclusion For many applications in computational finance, the use of quasirandom sequences seems to provide a faster rate of convergence than pseudorandom sequences. Unfortunately, at present there are only a few types of quasirandom sequences widely available. By scrambling a quasirandom sequence I can produce a family of related sequences. Derandomization provides more choices by which to find suitable quasirandom sequences. In this chapter, I focused on finding the optimal Faure sequence within GFaure. Based on Tezuka’s i-binomial scrambling, I proposed an algorithm and found an optimal Faure sequence within the family. I applied this sequence to evaluate a complex security and found promising results even for high dimensions. 40 CHAPTER 5 THE SCRAMBLED SOBOĹ SEQUENCE The Soboĺ sequence is the most popular quasirandom sequence because of its simplicity and efficiency in implementation [13, 14]. I summarize aspects of the scrambling technique applied to Soboĺ sequences and propose a new simpler modified scrambling algorithm, called the multi-digit scrambling scheme. Most proposed scrambling methods randomize a single digit at each iteration. In contrast, my multi-digit scrambling scheme randomizes one point at each iteration, and therefore is more efficient. After the scrambled Soboĺ sequence is produced, I use this sequence to evaluate a particular derivative security, and found that when this sequence is numerically tested, it is shown empirically to be far superior to the original unscrambled sequence. 5.1 The Soboĺ Sequence The construction of the Soboĺ sequence uses linear recurrence relations over the finite field, F2 , where F2 = {0, 1}. Let the binary expansion of the nonnegative integer n be given by n = n1 20 + n2 21 + ... + nw 2w−1 . Then the nth element of the jth dimension of the Soboĺ (j) sequence, xn , can be generated by (j) (j) (j) x(j) n = n1 ν1 ⊕ n2 ν2 ⊕ ... ⊕ nw νw . (j) where νi (5.1) is a binary fraction called the ith direction numbers in the jth dimension. These direction numbers are generated by the following q-term recurrence relation (j) νi (j) (j) (j) (j) (j) = a1 νi−1 ⊕ a2 νi−2 ⊕ ...aq νi−q+1 ⊕ νi−q ⊕ (νi−q /2q ). (5.2) I have i > q, and the bit, ai , comes from the coefficients of a degree-q primitive polynomial over F2 . Note that one should use a different primitive polynomial to generate the Soboĺ 41 (j) is to use the (j) becomes the direction numbers in each different dimension. Another representation of νi (j) integer mi (j) = νi ∗ 2i . Thus, the choice of q initial direction numbers νi (j) problem of choosing q odd integers mi (j) < 2i . The initial direction numbers, νi (j) = mi 2i , (j) in the recurrence, where i ≤ q, can be decided by the mi ’s, which can be arbitrary odd integers less than 2i . The Gray code is widely used in implementations [63, 6, 64] of the Soboĺ sequence. A Gray code, a permutation of integers, is a function of the integer n, and let G(n) be the nth Gray code. The binary representation of any G(n) and G(n + 1) differ in exactly one bit. The algorithm for generating Gray code is simple by the operation bitwise exclusive-or(⊕) of n and integer part of n/2: G(n) = n ⊕ n2 . The advantage of this implementation is that the Soboĺ sequence can be generated recursively. Instead of using ni , binary expansion of n in Equation (5.1), Antonov and Saleev [63] first used the expansion of the Gray code, G(n) instead of n. Then equation (5.1) can be replaced by the following recursive equation: (j) (j) xn+1 = x(j) n ⊕ νc , (5.3) where c is determined by the rightmost zero-bit in binary representation of n. Of course, the order of the Soboĺ points is different when Gray code is used to replace n. However, Gray code permutes the order of integers from 0 to 2n − 1. This order of the Soboĺ sequence does not effect its discrepancy if I all the numbers of this power-of-two. 5.1.1 Initial Direction Numbers The direction numbers in Soboĺ sequences come recursively from a degree-q primitive polynomial; however, the first q direction numbers can be arbitrarily assigned for the above recursion (equation (5.2)). Selecting them is crucial for obtaining high-quality Soboĺ sequences. The left pictures in both figures (5.1) and (5.2) show that different choices of initial direction numbers can make the Soboĺ sequence quite different. The initial direction numbers for the left picture in figure (5.1) is from Bratley and Fox’s paper [6]; while left picture in figure (5.2) results when the initial direction numbers are all ones. Soboĺ [14] realized the importance of initial direction numbers, and published an additional property (called Property A) for direction numbers to produce more uniform Soboĺ sequences; but implementations [64] of Soboĺ sequences showed that Property A is 42 not really that useful in practice. Cheng and Druzdzel [34] developed an empirical method (j) to search for initial direction numbers, mi , in a restricted space. Their search space was limited because they had to know the total number of quasirandom numbers, N, in advance (j) to use their method. Jackel [65] used a random sampling method to choose the initial mi (j) with a uniform random number uij , so that mi (j) condition that mi is odd. = ⌊uij × 2i−1 ⌋ for 0 < i < q with the Owing to the arbitrary nature of initial direction numbers of the sequence, poor twodimensional projections frequently appear in the Soboĺ sequence. Morokoff and Caflisch [2] noted that poor two-dimensional projections for the Soboĺ sequence can occur anytime because of the improper choices of initial direction numbers. The bad news is that I do not know in advance which initial direction numbers cause poor two-dimensional projections. In other words, poor two-dimensional projections are difficult to prevent by trying to effectively choose initial direction numbers. Fortunately, scrambling Soboĺ sequences [66, 11] can help us improve the quality of the Soboĺ sequence having to pay attention to the proper choice of the initial direction numbers. 5.2 Scrambling Methods Many methods [1, 21, 4, 41] have been proposed for scrambling quasirandom sequences. Some scrambling methods [66, 22, 46] were designed specifically for the Soboĺ sequence. Recall that this sequence is defined over the finite field [54], F2 . Digit permutation is commonly thought effective in the finite field, Fp . When digit permutation is used to scramble a quasirandom point over Fp , the zero is commonly left out. One reason is that zero is never the most significant bit, and when zero is added in the permutation, bias is introduced. The other is that permuting zero (assuming an infinite string of trailing zeros) leads to a biased sequence in the sense that zero can be added to the end of any sequence while no other digit can. So this strategy for pure digital permutation, where zero is not changed, is not suitable for the Soboĺ sequence because the Soboĺ sequence is over F2 . The linear permutation [66] is also not a proper method for scrambling the Soboĺ (1) (2) (s) sequence. Let xn = (xn , xn , . . . , xn ) be any quasirandom number in [0, 1)s , and zn = (1) (2) (s) (j) (zn , zn , . . . , zn ) be the scrambled version of the point xn . Suppose that each xn has a 43 0.8 0.8 Dimension 28, 1 Dimension 28, 1 0.6 0.6 0.4 0.4 0.2 0 0.2 0.2 0.4 0.6 Dimension 27 0.8 0 1 0.2 0.4 0.6 Dimension 27 0.8 1 Figure 5.1. Left: 4096 points of the original Soboĺ sequence and the initial direction numbers are from Bratley and Fox’s paper [6]; right: 4096 points of the scrambled version of the Soboĺ sequence (j) (j) (j) (j) b-ary representation as xn = 0.xn1 xn2 ...xnK ..., where K defines the number of digits to be scrambled in each point. Then I define zn(j) = c1 x(j) n + c2 , for j = 1, 2, .., s, (5.4) where c1 ∈ {1, 2, ..., b − 1} and c2 ∈ {0, 1, 2, ..., b − 1}. Since the Soboĺ sequence is built over F2 , one must assign 1 to c1 and 0 or 1 to c2 . Since the choice of c1 is crucial to the quality of the scrambled Soboĺ sequence, this linear scrambling method is not suitable for the Soboĺ sequence or any sequence over F2 . As stated previously, the quality of the Soboĺ sequence depends heavily on the choices of initial direction numbers. The correlations between different dimensions are due to improper choices of initial direction numbers [34]. Many methods [34, 65] to improve the Soboĺ sequence focus on placing more uniformity into the initial direction numbers; but this approach is difficult to judge by any measure. I concentrate on improving the Soboĺ sequence independent of the initial direction numbers. This idea motivates us to find another approach to obtain high-quality Soboĺ sequences by means of scrambling each point. 44 5.3 An Algorithm for Scrambling the Soboĺ sequence I provide a new approach for scrambling the Soboĺ sequence, and measure the effectiveness of this approach with the number theoretic criterion that I have used before [?]. Using this new approach, I can now scramble the Soboĺ sequence in any number of dimensions. The idea of our algorithm is to scramble k bits of the Soboĺ sequence instead of scrambling one digit at a time. Assume xn is nth Soboĺ point, and I want to scramble first k bits of xn . Let zn be the scrambled version of xn . My procedure is described as follows: 1. yn = ⌊xn ∗ 2k ⌋, is the k most-significant bits of xn , to be scramble. 2. yn∗ = ayn (mod m) and m ≥ 2k − 1, is the linear scrambling, applied to this integer. 3. zn = ∗ yn 2k + (xn − yn ), 2k is the reinsertion of these scrambled bits into the Soboĺ point. The key step of this approach is based on using Linear Congruential Generators (LCGs) as scramblers. LCGs with both power-of-two and prime moduli are common pseudorandom number generators. When the modulus of an LCG is a power-of-two, the implementation is cheap and fast due to the fact that modular addition and multiplication are just ordinary computer arithmetic when the modulus corresponds to a computer word size. The disadvantage, in terms of quality, is hard to obtain the desired quality of pseudorandom numbers when using a power-of-two as modulus. More details are given in [67]. So LCGs with prime moduli are chosen in this chapter. The rest of my job is to search for a suitable and reliable LCGs as my scrambler. When the modulus of a LCG is prime, implementation is more expensive. A special form of prime, such as a Merssene 1 or a Sophie-Germain prime 2 , can be chosen so that the costliest part of the generation, the modular multiplication, can be minimized [67]. To simplify the scrambling process, I look to LCGs for guidance. Consider the following LCG: yn∗ = ayn (mod m), (5.5) where m is chosen to be a Merssene, 2k −1, or Sophie-Germain prime in the form of 2k+1 −k0 , k is the number of bits needed to “scramble”, and a is a primitive root modulo m [55, 68]. 1 q 2 − 1 and q are primes, and 2q − 1 is a Merssene prime. 2 2q + 1 and q are primes, and 2q + 1 is a Sophie-Germain prime. 45 I choose the modulus to be a Merssene or Sophie-Germain [67, 40] because of the existence of a fast modular multiplication algorithms for these primes. For more details please refer to Appendix B. The optimal a should generate the optimal Soboĺ sequence, and the optimal a’s for modulus 231 − 1 are tabulated in [68]. A proposed algorithm for finding such optimal primitive root modulus m, a prime, is described in Chapter 3. The purpose of my algorithm is twofold. Primarily, it provides a practical method to obtain a family of scrambled Soboĺ sequences. Secondarily, it gives us a simple and unified way to generate an optimal Soboĺ sequence from this family. FINDER [24, 25], a commercial software system which uses quasirandom sequences to solve problems in finance, is an example of the successful use of derandomization. A modified Soboĺ sequence is included in FINDER. Although the creators of FINDER pointed out that the major improvements in their modified Soboĺ sequence were achieved via optimized initial direction numbers for dimension up to 360, the method they used for this improvement was not revealed, and FINDER was patented. However, using my algorithm, I can begin with the worse choices for initial direction numbers in the Soboĺ sequence: all initial direction numbers are ones. The results are showed in figure (5.3). The only unscrambled portion is a straight line in both pictures. The reason is that the new scrambling algorithm cannot change the point with the same elements into a point with different elements. 5.4 Numerical Results Here, I present the valuation of a complex option which has a simple analytical solution. The popular example for such problems is a European call option on the geometric mean of several assets, sometimes called a geometric Asian option. I followed the above formula in section 3.3 of Chapter 3, computed the prices for different values of N = 10 and N = 30, with K = 100, I computed p = 12.292 and p = 12.631 respectively. From Figure 5.3, it is easily seen that the optimal Soboĺ sequence performs much better than the original Soboĺ sequence when the number of dimensions is as low as 10 and the number of dimensions increases to 30. 46 0.8 0.8 Dimension 28, 1 Dimension 28, 1 0.6 0.6 0.4 0.4 0.2 0 0.2 0.2 0.4 0.6 Dimension 27 0.8 0 1 0.2 0.4 0.6 Dimension 27 0.8 1 Figure 5.2. Left: 4096 points of the original Soboĺ sequence with all initial direction numbers ones [7], right: 4096 points of the scrambled version of the Soboĺ sequence Figure 5.3. Left figure: geometric mean of 10 stock prices; right figure: geometric mean of 30 stock prices. Here the label “Sobol” refers to the original Soboĺ sequence [6], while “DSobol” refers to my optimal Soboĺ sequence. 47 5.5 Conclusion A new algorithm for scrambling the Soboĺ sequence is proposed. This approach can avoid avoiding the consequences of improper choices of initial direction numbers that negatively impact the quality of this sequence. Therefore, my approach can enhance the quality of the Soboĺ sequence without worrying about the choices of initial direction numbers. In addition, I proposed an algorithm and found an optimal Soboĺ sequence within the scrambled family. I applied this sequence to evaluate a complex security and found promising results even for high dimensions. I have shown the performance of the Soboĺ sequence generated by my new algorithm empirically to be far superior to the original sequence. The promising results prompt us to use more applications to test the sequences, and to reach for more general scrambling techniques for the Soboĺ sequence. 48 CHAPTER 6 RANDOMIZATION OF LATTICE POINTS 6.1 Introduction There are three important families of quasirandom sequences: Halton sequences, lattice points, and (t, s)-sequences. The scrambled versions of Halton sequences and (t, s)-sequences have been discussed in previous chapters. In this chapter, I will focus on lattice points, sometimes called number-theoretic methods, and their randomization. This chapter is included for completeness but contains no new results of ours. Let {x1 , x2 , . . . , xN } denote a quasirandom sequence. There are several methods for constructing the point set: either depending on N or not. If each point, xn , in the set {x1 , x2 , . . . , xN } is constructed independently of N, this is called the open quasi-Monte Carlo method. Halton sequences and (t, s)-nets are examples of the open quasi-Monte Carlo method. The other construction is when the point, xn , depends on N: lattice points are an example of this so-called closed quasi-Monte Carlo method. When a sequence is related to N, the disadvantage is that it is impossible to obtain useful error estimates for closed quasi-Monte Carlo by repeating the calculation with an increasing number of points. However, randomization give us an approach for obtaining error estimates in this case. Another approach is that of Hickernell [69], which is based on constructing infinite lattice sequences, therefore N is not needed in advance. The advantages of lattice rules lie in their simplicity and their power as integration nodes for a wide class of periodic integrands, especially those whose multiple Fourier expansions have coefficients tending rapidly to zero. Meanwhile, the motivation for randomizing lattice rules is to find a practical error estimate when lattice rules are used for numerical integration. 49 Figure 6.1. An example of a lattice point set. The main scrambling technique for lattice points is shifting, which is defined as xn + u (mod 1), where xn is a lattice point and u is a random number uniformly chosen in [0, 1)s . 6.2 The Methods of Good Lattice Points Lattice rules can be viewed as multidimensional analogues of the one-dimensional trapezoidal rule for periodic integrands. Definition 1 Let (N; g1 , g2 , ..., gs ) be a vector of integers satisfying 1 ≤ gj ≤ N, and gk = gj for k = j and 1 ≤ k, j ≤ s. Then xij = igj N (mod 1), 1 ≤ j ≤ s, i = 1, 2, . . . , (6.1) where the point xn = (xn1 , . . . , xns ). Then the set {xn , 1 ≤ n ≤ N} is called the set of lattice points, and (N; g1 , g2 , ..., gs ) is called the generating vector. The simplest and most effective form for lattice rules is the method of good lattice points. Similar to the definitions of the Halton sequence and (t, s)-sequence, an s-dimensional good lattice point rule can be expressed as [8, 70]: 50 s ∗ = Θ( (logNN ) ), Definition 2 If the lattice point set {xn , 1 ≤ n ≤ N} has star-discrepancy DN then the set {xn , 1 ≤ n ≤ N} is called a set of good lattice points. Any set of good lattice points is completely determined by its generating vector (N; g1 , g2 , ..., gs ). In practice, N is given, and I have to find a suitable (g1 , g2 , ..., gs ). It is normally a very computationally costly task to find the best generating vector, i. e. a vector which has the smallest discrepancy among all possible sets of lattice points. Therefore Korobov [23] suggests considering {g1, ..., gs } to be of the form g = {g1 , ..., gs } = {1, a, a2 , ..., as−1 } (mod N), (6.2) with 0 < a < N an integer. For given values of N and s, a computer search can be implemented to find the vector, g, which minimizes an appropriate figure of merit. A table of generating vectors (s < 19) can be found in [70, 46]. The Korobov form for good lattice points is a good choice on theoretical as well as practical reasons. 6.3 Criteria for Good Generating Vectors Generating vectors for lattices are usually obtained by minimizing some measure of the discrepancy of the lattice. Classically, two criteria are widely used to measure the lattices: one is denoted as Pα (g, N) [71], the other is R(g, N) [56]. Let Eα (c) be the class of functions, f , whose Fourier coefficients satisfy ˆ |f(z)| ≤ c , (z¯1 z¯2 ...z¯s )α (6.3) where c > 0, z̄ = max(1, |z|), and α > 1 is fixed. Pα is defined as Pα (g, N) = gz≡0 (mod N ) (z¯1 z¯2 ...z¯s )−α − 1, (6.4) with the α coming from Eα (c). The quantity Pα measures the quality of the lattice point set. Those readers familiar with the spectral test should recognize a similarity here to that 51 well-known figure of merit from random number testing [55]. Sloan and Wozniakowski [72] modify Pα with weights γ as follows, Pα,γ (g, N) = −1 + (β̄1 (z1 )...β̄s (zs ))−α . (6.5) gz≡0 Here βi (zi ) = zi /γi, and the nonincreasing weights γi > 0, for 1 ≤ i ≤ s, are associated with the successive coordinate directions. Let{x1 , x2 , . . . , xN } denote a good lattice point set, and I(f ) = [0,1]s f (x)dx denote the true integral value of f , f ∈ Eα (c). Let N1 N i=1 f (xi ) be an approximation to the integral of the function, f . Then the error between this and I(f ) can be expressed: Theorem 1 For any real number α > 1 and C > 0, any g ∈ Z s , and any integer N ≥ 1, I have N 1 f (xi ) − I(f ) ≤ CPα (g, N), (6.6) N i=1 where the function f ∈ Eα (c), and {x1 , . . . , xN } is any good lattice point set. Another criterion that can be used to assess lattice points is the quantity R(g, N) [56], defined by R(g, N) = gz≡0 where W = {z ∈ Z s : − N2 ≤ zi ≤ N ,1 2 (mod N ),z∈W 1 , z¯1 z¯2 ...z¯s (6.7) ≤ i ≤ s}. ∗ is The relation among Pα (g, N), R(g, N)), and DN ∗ DN ≤ 1 s + R(g, N), and N 2 Pα (g, N) ≤ R(g, N)α + O(N −α ). (6.8) (6.9) ∗ , the Koksma-Hlawka inequality (equaSince R(g, N) is related to the star-discrepancy, DN tion 1.3) can be expressed in term of R(g, N). 52 6.4 Randomization In general, a lattice point set is related to N, and increasing N cannot help us obtain error estimates since the above quality measures tend to fluctuate erratically. I note that the reason for this is that different point sets related to different values of N, in general, have nothing in common. However, randomization of lattice rules can allow the calculation of practical error estimates. The Cranley-Patterson [73] randomization can be expressed as xij = igj + ∆j N (mod 1), (6.10) for j = 1, 2, ..., s, i = 1, 2, ..., N and where g = (g1 , . . . , gs ) is the generating vector of a good lattice points set, and ∆ = (∆1 , ..., ∆s ) is a random vector from [0, 1)s . Two possible choices for ∆ are considered. The first is where ∆ is chosen from a multivariate uniform distribution, and the second where it is chosen by systematic sampling. In this paper, I only consider the first choice. xkij = igj N + ∆jk (mod 1) for k = 1, 2, . . . , r forms a stochastic family. Taking r replicates of these N points, confidence intervals for the error can be obtained. Let I(f ) be defined as above. Then I define Qf (g, N, ∆k ) as N −1 1 f (xki ), Qf (g, N, ∆k ) = N i=0 (6.11) where xki = ( igN1 +∆k , igN2 +∆k , . . . , igNs +∆k ) (mod 1). Suppose ∆k has a multivariate uniform distribution in [0, 1)s , then I have E(Qf (h, N, ∆k )) = I(f ), (6.12) where E(.) is expectation with respect to ∆k . For any integer, r, let ∆1 , . . . , ∆r be independent random vectors [71] chosen from a multivariate uniform distribution in [0, 1)s , then the estimate r 1 Q(f ) (g, N, ∆k ) Q̂(f ) (g, N) = r k=1 is an unbiased estimate of I(f ). 53 (6.13) 6.5 Infinite Lattice Sequences Hickernell [69] extended the idea of thinking of lattices via their underlying (t, m, s)-nets, and in this way obtained infinite lattice sequences. The basic idea is to replace N, the number of lattice points, with bm . In addition, Joe [74] gave an explicit expression for a mean L2 -discrepancy for shifted infinite lattice point sets. Hickernell’s approach goes as follows. The ith term of a good lattice is then the term i/N ih . N Let N = bm , = i/bm , i = 0, 1, ..., N − 1, may be replaced by using the digital inverse function φb (i) = i1 b−1 + i2 b−2 + · · · = (0.i1 i2 ...)b , from the van der Corput sequence. Therefore, a shifted infinite good lattice sequence in base b with generating vector g and shift ∆ can be defined as P = {φb (i)g + ∆ (mod 1) |i = 0, 1, 2, ...}. (6.14) Similar to the definition for (t, s)-sequences, every run has bm points. For example, the lth run is the set {xn | lbm ≤ n < (l + 1)bm }. Suppose that Q is the set consisting of the (l + 1)st run of the bm terms of the infinite lattice sequence, then Q is defined as Q = {φb (lbm + i)h + ∆ (mod 1)}. (6.15) For i = 0, 1, 2, ..., bm , l = 0, 1, 2, ..., Q can be expressed as Q = {φb (i)g + φb (l)b−m−1 g + ∆ (mod 1)}, (6.16) for i = 0, 1, 2, ..., bm − 1. 6.6 Conclusion Lattice point sets are simple to code, yet their difficulty lies in finding a good generator value, g, given N. When lattice point sets are used to approximate an integral in s-dimensions, the integrand, f , must be smooth, periodic, and f ∈ Eα (c). It is very demanding computational work to find the best generating vector {g1 , g2 , ..., gs } ∈ Rs . In practice, Korobov’s form [70] is considered, and tables of good values for N and g can be found in [70] for 1 < s < 19. However, for dimension s > 19, one should still search for his own optimal g and N. After g and N are chosen, the implementation of lattice rules and their randomizations is straight forward. 54 CHAPTER 7 APPLICATIONS Three aspects of applications for scrambled quasirandom sequences are addressed here: (1) obtaining automatic statistical error estimates for quasi-Monte Carlo; (2) generating parallel quasirandom sequences that are especially good for distributed or grid computing; (3) and providing more optimal quasirandom sequences for quasi-Monte Carlo. This chapter explores all three aspects below in detail. 7.1 Automatic Error Estimates for QMC The original motivation [73, 1, 11] for scrambled quasirandom sequences was to obtain automatic statistical error estimates for QMC and improve the quality of quasirandom sequences. Methods for obtaining unbiased error estimates for QMC will be presented in this section. 1 The convergence rate for Monte Carlo methods is asymptotically O(N − 2 ), yet quasiMonte Carlo methods can have an error bound which behaves as well as O((logN)s N −1 ). The Koksma-Hlawka inequality (1.5) gives us the deterministic error bound for QMC. However, it is very difficult to compute the total variation of the integrand, V (f ), and ∗ the star-discrepancy, DN , in practice. The Koksma-Hlawka inequality is actually not very useful in practical QMC error estimation. In [75], the authors pointed out that “Quasi-Monte Carlo methods will come into their own only when improved error estimates are available.” Therefore, it is important to find practical ways to obtain direct, a posteriori error estimates in QMC. Scrambled quasirandom sequences play a central role in providing a statistical method for error estimation in QMC. The problem I consider here is estimating an integral in [0, 1)s : I(f ) = f (x)dx. (7.1) [0,1)s 55 QMC computes an approximate value of equation (7.1) by N 1 ˆ I(f ) = f (xi ). N i=1 (7.2) There are several proposed methods to estimate the accuracy of equation (7.2). • Replication method [76] By using randomized QMC, I can take r independent replicates of a scrambled net. The corresponding estimates Iˆ1 ,...,Iˆr are unbiased estimates of I. I then calculate the overall estimate of I by r ¯ )= 1 I(f Iˆk . r (7.3) k=1 ¯ ). The error of numerical integration is estimated by using the variance of I(f r 1 ¯ ))2 . (Iˆk − I(f σ̂ = r(r − 1) k=1 2 (7.4) • Partition method The partition method [76] is similar to the replication method except that it uses r partitions of a single net, each of size n, instead of r scrambled nets. • Multipartition method Synder [77] proposed a modified partition method reducing the bias which may be introduced by the partition method in some cases. If a single net of size n is partitioned into b sets of n/b, the sample variance for sets of size n/b is estimated using a statistic evaluated from these partitions. Randomized QMC not only provides us with unbiased estimators of the error, but also allows us to utilize certain variance reduction techniques [78]. 7.2 Parallel Quasirandom Sequences One advantage of QMC is that it is easy to parallelize applications, and so producing high quality parallel quasirandom sequences is important. Scrambling provides a natural 56 way to parallelize quasirandom sequences, because scrambled quasirandom sequences form a stochastic family which can be assigned to different processes in a parallel computation. This scheme is different from other proposed schemes such as leap-frog [79] and blocking [80], which split up a single quasirandom sequence. MC applications are often readily parallelized, and one would expect the same for QMC applications. Parallel computations using QMC require a source of quasirandom sequences, which are distributed among the individual processing units. In contrast to the study of parallel pseudorandom numbers, there are very few papers on using quasirandom sequences for parallel computing. Schmid [80, 81] pointed out “Only a little amount of work has been done using (t, s)-sequences for parallel numerical integration.” Bromley [79] describes a leap-frog parallelization technique to break up Soboĺ sequences into interleaved subsets. Schmid extends and generalizes Bromley’s work by the use of blocking and leap-frogging for all types of binary digital (t, s)-sequences. They find that blocking is more robust than leap-frogging. Li and Mullen [82] proposed a parallel algorithm for (t, m, s)-nets for use in financial problems, and then use it on (t, m, s)-nets in valuating derivatives and other securities. Similar to the methods for parallelizing pseudorandom number sequences, there are three basic ways to parallelize quasirandom number sequences: • Leap-frog - The sequence is partitioned in turn among the processors like a deck of cards dealt to card players. • Sequence splitting or blocking- The sequence is partitioned by splitting it into nonoverlapping contiguous subsections. • Independent sequences - Each processor has its own independent sequence. The first and second schemes produce numbers from a single quasirandom sequence. Meanwhile, the third scheme needs a family of quasirandom sequences. Scrambling techniques can generate such a stochastic family of quasirandom sequences from one original quasirandom sequence. Thus each scrambled sequence in the family is independent and can be assigned to each processor. Ökten and Srinivasan [83] first used scrambled Halton sequences as a method for parallelizing quasirandom sequences. Scrambling methods provide a natural 57 way to parallelize quasirandom sequences, and scrambling itself depends on permutations or pseudorandom numbers. Hence, different permutations will lead to different quasirandom sequences (or sets). Each scrambled variant of a parent stream can be considered as another parallel stream of quasirandom numbers. Blocking and leapfrog use a single quasirandom sequence and assign subsequences of this quasirandom sequence to different processes. The idea behind blocking and leapfrog is to assume that any subsequence of quasirandom sequence has the same uniformity as the parent quasirandom sequence. This is an assumption that is often false. In comparison to blocking and leapfrog, each scrambled sequence can be thought of as an independent sequence and assigned to a processor, and under certain circumstances it can be proven that the scrambled sequences are as uniform as the parent. Since the quality (small discrepancy) of quasirandom sequences is a collective property of the entire sequence, forming new sequences from parts is potentially troublesome. Therefore, scrambled quasirandom sequences provide a very appealing alternative for parallel quasirandom sequences, especially where a single quasirandom sequence is scrambled to provide all the parallel streams. Such a scheme would also be very useful for providing QMC support to the computational grid [84]. 7.2.1 A Parallel and Distributed Library QMC applications have high degrees of parallelism, can tolerate large latencies, and usually require considerable computational effort, making them extremely well suited to parallel, distributed, and even Grid-based computational environments. In these environments, a large QMC problem is broken up into many small subproblems. These subproblems are then scheduled on the parallel, distributed, or Grid-based environment. In a more traditional instantiation, these environments are usually a workstation cluster connected by a local-area network where the computational workload is cleverly distributed. Recently, peer-to-peer or Grid computing, the cooperative use of geographically distributed resources unified to act as a single powerful computer, has been investigated as an appropriate computational environment for MC applications [84]. There, the computational infrastructure developed was based on the existence of a high-quality tool for parallel pseudorandom number generation, 58 the Scalable Parallel Random Number Generators (SPRNG) library [85]. The extension of this technology to quasirandom numbers would be very useful. 7.2.2 Testing Parallel Quasirandom Sequences As mentioned above, discrepancy is the standard measure of uniformity for quasirandom sequences. Besides that, the Kolmogorov-Smirnov (KS) and the Cramer-von Mises (CS) type statistics are used for testing multivariate distributions. Let F denote the theoretical cumulative distribution of the distribution being tested which must be a continuous distribution, and Fn denote the empirical distribution, then these statistics are well known and can be defined as: KS = sup |Fn (x) − F (x)| , (7.5) x∈[0,1)s and CM = n [0,1)s |Fn (x) − F (x)|2 dx. (7.6) A connection between the discrepancy and KS and CM statistics is: let P denote a set of all sampling points. When the points in P are random samples, and F (x) has a uniform distribution in [0, 1)s , then star-discrepancy reduces to a KS-type statistic, and the L2 -discrepancy reduces to a CM-type statistic [86]. Testing uniformity in (0, 1]s is reviewed in [87]. Liang et al. [88] obtained new statistics to test multivariate uniformity. These statistics come from generalized discrepancy [89] and are easy to calculate in high dimensions. The best way of testing a parallel library is to solve practical problems. High-dimensional integral problems for computational finance, Bayesian networks, and a published set of test integrands can be chosen to test the effectiveness of scrambled quasirandom sequences. However, for parallel computing, I have to consider interactions between different streams, which can be thought of as a kind of correlation. Thus I must resort to empirical testing in an attempt to discover such properties. Bot statistical tests and high-dimensional integral problems play a critical part in empirical testing. 7.3 Derandomization I have proposed optimal algorithms for the Halton, Faure and Soboĺ sequence in previous chapters. Here, I give a summary of derandomization. 59 As the utility of QMC has developed, many researchers [73, 1] have not been satisfied with the quality of existing quasirandom sequences. As such, many methods have been used to attempt to improve the quality of quasirandom sequences. Scrambled sequences have good performance in practice. Thus, the practice is now to incorporate such randomized sequences routinely in applications. If one proposes scrambling algorithm with random scramblings, this produces a stochastic family of quasirandom sequences. The process of searching and specifying optimal quasirandom sequences that achieve theoretically and empirically optimal results is an important problem in QMC. The process of finding such optimal quasirandom sequences is commonly called “derandomization.” For example, GFaure [7] is a family of scrambled Faure sequences that has been successfully used in computational finance [25]. Tezuka’s i-binomial scrambling [26] is a special case of GFaure and reduces the scrambling space from O(K 2 ) to O(K), and gives one a specific search criterion and smaller space in which to find optimal GFaure sequences. In fact, there are very few theoretical or practical results for derandomizing quasirandom sequences. One important and open question is how to provide a theoretical basis for derandomization and also to provide practical derandomizations [29, 20]. In addition, derandomizing quasirandom sequences not only aims to finding an optimal sequence within this scrambled family, but also finding a set of optimal sequences. Thus, error estimation can be obtained by using several scrambled optimal sequences. In such cases, the accuracy of error estimation is expected to keep steady and smaller. 7.4 Conclusion In this chapter, applications of scrambled quasirandom sequences were introduced. Scrambled quasirandom sequences contribute to QMC in obtaining automatic error estimates and generating parallel quasirandom sequences. Possible methods for achieving automatic error estimates were presented as were approaches for parallelizing quasirandom sequences from a family of scrambled quasirandom sequences. Derandomization provides us not only the optimal quasirandom sequence used in QMC, but also the possibility of reducing the error in conjunction with automatic error estimation. 60 CHAPTER 8 CONCLUSIONS AND FUTURE WORK Quasirandom sequences are important in practice. This is because QMC methods offer the hope of fast convergence for problems where MC is important. However, one has only a few choices of quasirandom sequences: the Halton sequence, (t, s)-sequences, and lattice points. Fortunately, scrambled quasirandom sequences provide us more choices, and the use of derandomization can potentially lead to finding optimal quasirandom sequences for use in applications. In this dissertation, I detailed various scrambling methods for quasirandom sequences and proposed some new scrambling algorithms for Halton, Faure and Soboĺ sequences. I show that my scrambled quasirandom sequences do improve the quality of unscrambled sequences through measures of two-dimensional projections and high-dimensional integration problems. I also presented applications for scrambled sequences that included automatic error estimation and the generation of high-quality parallel quasirandom numbers. I further introduced the notion of derandomizing a family of scrambled quasirandom sequences to find optimal sequences. The schemes for derandomizing quasirandom sequences are explored in this dissertation. I limit my search space in the linear scrambling for searching for the optimal quasirandom numbers. I feel that as randomization technology in quasirandom number generation becomes more widespread, the usefulness of QMC will grow, and more people will be tempted to see if their MC application can be accelerated with quasirandom numbers. Based on my research into effective scrambling methods and outlines in Chapter 7, I could integrate a highly effective library for parallel and distributed quasirandom number generation. Such a library would be modelled after the SPRNG [90] library, and also provide tools for automatic error estimation in QMC, and in particular offer error estimation for automatic integration [43]. 61 Due to the relative simplicity of the Halton sequence and lattice points, randomization of them are fast and relatively simple to implement. However, constructing (t, s)-sequences is complicated, and so their randomization is more complicated. Also, straightforward implementation of nested scrambling [11, 76, 22] (Owen scrambling) is too complicated in practice. Tezuka scrambling [91, 26] was only designed for (0, s)-sequences. Finding an effective scrambling method suitable for (t, s)-sequences and keeping the generality of Owen’s nested scrambling is still ongoing. Producing a quasirandom sequence should not occupy too much CPU time and computer memory in practice. In another words, it should be effective and fast to produce quasirandom numbers in any application for QMC. The idea behind Tezuka scrambling is to scramble the generator matrix, which is common for constructing every (t, s)-sequence. Following this lead, there may exist a way to scramble (t, s)-sequence with t > 0. Derandomization plays the same important role as scrambling in QMC. Derandomization has been successfully used in practice in several cases. For a certain application, existing quasirandom sequences may not be satisfactory, and derandomization can provide us with the important alternatives. Since derandomization is still a relatively new technique, there are many open questions that need to be answered, I list just a few of them as below. • Finding an effective and easy implementation of a general method to derandomize a scrambled family of quasirandom sequences. • Searching for a family of scrambled quasirandom sequences among various scrambled families. In another words, finding a derandomization method that works independently of the scrambling method. • Theoretical and empirical criteria for measuring the quality of derandomization that are computationally tractable. 62 APPENDIX A PARALLEL PSEUDORANDOM NUMBER GENERATORS Pseudorandom number generators can be used as scramblers for quasirandom numbers. The main algorithms used for sequential pseudorandom number generators are the following: • Linear congruential generators (LCGs), • Lagged-Fibonacci generators (LFGs), • Shift-register generators (SRGs), • Inversive congruential generators (ICGs), • Combinations of the above generators. Among these generators, LCGs are the best known and have the best developed theory. Therefore, most current implementations of parallel random number generators are based on LCGs. For an overview, please see [92]. In general, there are three techniques to parallelize random numbers • Leap-frog [92]: the sequence is partitioned in turn among the processors as are cards around a card table. If each processor leap-frogs by L in the sequence of random numbers, {Xn }, then processor Pi will generate a random sequence with numbers Xi , Xi+L , Xi+2L , . . . , • Blocking [92]: the sequence is partitioned by splitting it into non-overlapping contigu- ous sections. If the length of each section is L in the random number sequence, {Xn }, then processor Pi will be assigned the random sequence with numbers XiL , XiL+1 , XiL+2 , . . . , 63 • Parameterization [85]: the initial seed or other parameters in a generator can be carefully chosen in such a way so as to produce long period independent sequences for each processor. For example, an LCG can be expressed as Xn+1 = aXn + b (mod p). (A.1) There are two ways to parameterize the LCG in Eq.(A.1). One is the judicious choice of different additive constants, b, to form different sequences with power-of-two moduli, and the other is to choose different multipliers, a, with prime moduli [67]. However, both the parallelization methods of leap-frog and blocking may suffer from long-range correlations [93]. One must be aware of those correlations before one chooses the size of blocking or the leap-frog jump. In addition, the period of the parent generator has to be long enough so that one has enough random numbers to assign to each processor. In comparison to blocking and leapfrog, each parameterized sequence can be thought of as an independent sequence and assigned to a processor. Good random number generators are hard to find, and high-quality, efficient algorithms for pseudorandom number generation on parallel computers are even more difficult to find. There are a few available software packages for random number generators, such as the SPRNG library [85], the Mersenne twister [94] and the nonlinear inversive congruential generator from pLab [95]. We can choose widely tested and relatively reliable PRNGs as our scrambler by using such packages. 64 APPENDIX B LCGS WITH SOPHIE-GERMAIN MODULI Linear Congruential Generators (LCGs) with both power-of-two and prime moduli have been used in implementations of scrambling the Soboĺ sequence. Here we give a brief introduction of LCGs with Sophie-Germain moduli. If the modulus of the LCGs is a properly chosen Sophie-Germain prime, the LCG can achieve roughly the same generation speed as LCGs with Mersenne prime moduli. Modular multiplication within the generator PMLCG (Prime Modulus LCG) in the SPRNG (Scalable Parallel Random Number Generators) library [90] is replaced by integer addition. We will use the same trick for LCGs with Sophie-Germain prime moduli. After a brief search, we have found suitable Sophie-Germain primes to use as the moduli for our implementations. There seem to always be many Sophie-Germain primes close to 2q . In Table B.1, we list the Sophie-Germain primes that are just below the power-of-two for exponents 15 through 64. There may be “closer” Sophie-Germain primes slightly larger than these powers-of-two, but they would require one more bit in their binary representation. We do not think that using such primes will gain benefit in practice. B.1 Parallelization Since we wish to utilize the same algorithm on every processor, we cannot choose to parameterize the modulus when we work so hard to optimize modular reduction based on the modulus. In addition, if we consider modular parameterization, the period of the sequences will be different, and the theoretical measure of interprocessor correlation via exponential sums is analytically intractable. To understand the computational efficiency of the choice of Sophie-Germain primes in parameterized LCGs, we implemented a SGMLCG (Linear Congruential Generator with Sophie-Germain Modulus) using the same structure as the PMLCG library in SPRNG [85]. 65 Table B.1. The Sophie-Germain (S-G) Primes Closest to but Less Than 2q 2q 264 262 260 258 256 254 252 250 248 246 244 242 240 238 236 234 232 230 228 226 224 222 220 218 216 S-G primes (2q − k) 264 -1469 262 -10565 260 -3677 258 -137 256 -2249 254 -4805 252 -473 250 -161 248 -5297 246 -857 244 -1493 242 -2201 240 -437 238 -401 236 -137 234 -641 232 -209 230 -1385 228 -437 226 -677 224 -317 222 -17 220 -233 218 -17 216 -269 2q 263 261 259 257 255 253 251 249 247 245 243 241 239 237 235 233 231 229 227 225 223 221 219 217 215 S-G primes (2q − k) 263 -4569 261 -2373 259 -18009 257 -3993 255 -789 253 -1269 251 -465 249 -2709 247 -1485 245 -573 243 -741 241 -1965 239 -381 237 -45 235 -849 233 -9 231 -69 229 -189 227 -405 225 -633 223 -321 221 -9 219 -45 217 -285 215 -165 The performance comparisons are presented in table 2. There we tabulate the initialization and generation times for PMLCG and SGMLCG generators. The two generators we used were specifically the following: • PMLCG The generator is defined by the following relation: xn = axn−1 (mod 261 − 1). (B.1) where the multiplier, a, differs for each process. The multiplier is chosen to be certain powers of 37, a primitive root modulo 261 − 1, that give maximal period cycles of 66 acceptable quality. The period of this generator is 261 − 2 and the number of distinct streams available is roughly 258 . • SGMLCG The generator is defined by the following relation: xn = axn−1 (mod 264 − 21017). (B.2) where the modulus, m = 264 − 21017 is a 64-bit Sophie-Germain prime, and the multiplier, a, differs for each stream. The multiplier is chosen to be certain powers of 13, a primitive root modulo 264 − 21017, that give maximal period cycles of acceptable quality. The period of this generator is (264 − 21017) − 1, and the number of distinct streams available is 263 − 10509. This generator passed both the DIEHARD [96] statistical tests of randomness and the extensive randomness test suite in SPRNG [85]. If we carefully choose a Sophie-Germain prime as modulus, m, in the form 2q − k, we can implement an LCG with both a fast modular multiplication and the fast calculation of the kth integer relatively prime to m − 1. Besides fast modular multiplication and fast calculation of the kth number relatively prime to m − 1, we have more choices for Sophie-Germain primes (in Table (B.1)). There are only four Mersenne primes in the interval (215 , 264 ): 217 − 1, 219 − 1, 231 − 1, and 261 − 1, and the next Mersenne prime is 289 − 1. By contrast, we expect there to be approximately 1.23 × 1016 Sophie-Germain primes in this same interval. Since SGMLCG gives us both a smaller initialization time and a competitive generation speed, LCGs with Sophie-Germain prime moduli are the better choice. SGMLCG is currently being incorporated into SPRNG. 67 To that end, APPENDIX C LINEAR SCRAMBLING AND DERANDOMIZATION Randomizing quasirandom sequences produces a stochastic family of quasirandom sequences, which can be used in a number of settings. It is a natural question, therefore, to ask how to choose an optimal quasirandom sequence from this family. Derandomization techniques provide us a way to find an optimal sequence from such a family of quasirandom sequences. The process of finding such optimal quasirandom sequences is called the derandomization of a randomized (scrambled) family. Before derandomization, one has to choose a scrambling space. In this appendix, I will give an overview of scrambling methods and a motivation for derandomization of the Halton, Faure and Soboĺ sequences in linear scrambling spaces. In addition, I describe the derandomization process in detail. C.1 Scrambling Methods The purpose of scrambling is two-fold. The original motivation of scrambling [1, 2, 46] aims toward obtaining more uniformity for quasirandom sequences in high dimensions, which can be checked via two-dimensional projections. Secondly, Owen scrambling [11, 46], called nested scrambling, was developed to provide a practical error estimate for QMC. Now, I outline the background of various scrambling methods for (t, s)-sequences. The first scrambling technique was proposed by Warnock [41] in order to calculate the L2 discrepancy of Halton sequences. In that paper, Warnock tried to minimize the discrepancy of the Halton sequence by considering scrambled versions. Later, Braaten and Weller [1] used permutations to further minimize the discrepancy and improve the behavior of any pair of coordinates of the Halton sequence. 68 In 1988, Shaw [46] proposed a scrambling technique which combines shifting and permutations, and applied these scrambled quasirandom sequences to compute posterior distributions in Bayesian statistics. Shaw first applied his scrambling to Soboĺ sequences, and pointed out that scrambling can be used to assess the overall accuracy of integration. After Niederreiter sequences [16] were proposed, Owen [11] and Tezuka [7] in 1994 independently developed two powerful scrambling methods for (t, s)-sequences. Owen also explicitly pointed out that scrambling can be used to provide error estimates for QMC. Although many other methods for scrambling (t, s)-sequences have been proposed [21, 48, 97, 98], most of them are really modified or simplified Owen and Tezuka schemes. Owen’s scheme is theoretically powerful for (t, s)-sequences. Tezuka’s algorithm was proved to be efficient for (0, s)-sequences. C.1.1 Owen Nested Scrambling (1) (2) (s) Let xn = (xn , xn , . . . , xn ) be a quasirandom number in [0, 1)s , and let zn = (1) (2) (s) (j) (zn , zn , . . . , xn ) be the scrambled version of the point xn . Suppose each xn can be (j) (j) (j) (j) represented in base b as xn = (0.xn1 xn2 ...xnK ...)b with K being the number of digits to be scrambled. Then nested scrambling proposed by Owen [11, 20] can be defined as follows: (j) (j) (j) (j) zn1 = π• (xn1 ), and zni = π•x(j) x(j) ...x(j) (xni ), with independent permutations π•x(j) x(j) ...x(j) n1 n2 n1 ni−1 n2 ni−1 for i ≥ 2. Of course, (t, m, s)-net remains (t, m, s)-net under nested scrambling. However, nested scrambling requires bi−1 permutations to scramble the ith digit. Owen scrambling (nested scrambling), which can be applied to all (t, s)-sequences, is powerful; however, from the implementation point-of-view, nested scrambling or so-called path dependent permutations requires a considerable amount of bookkeeping, and leads to more problematic implementation. C.1.2 Tezuka Scrambling Tezuka [17, 29] proposed generator matrix scrambling methods: GFaure and NFaure for (0, s)-sequences. Generalized Faure Sequences: The idea behind GFaure is to use a random matrix to permute the generator matrix to obtain a stochastic family of Faure sequences. Source 69 code for GFaure is available at [7]. GFaure’s generator matrix for ith coordinate is defined as C (i) = A(i) P i−1 , where P is the usual Pascal matrix, and A(i) is a random nonsingular lower triangular matrix. With a randomized generator matrix, I produce scrambled quasirandom numbers directly. If all the elements of the lower triangular matrix, A(i) , below the upper triangle are non-zero, then GFaure sequences are a special case of Owen scrambling. This property comes from the fact that multiplication by a non-zero element of Fb is a particular permutation of elements in Fb . If the matrices, A(i) , are randomly chosen, I obtain an approach to Tezuka and Owen scrambling suggested by Hong and Hickernell. GFaure, which is available in the FINDER [24, 25] software package, is based on carefully selected matrices, A(i) . FINDER has some well chosen A(i) , that are fixed in its implementation. A New Generalization of the Faure Sequences: (NFaure) Let b be a prime power with b ≥ s. For an arbitrary nonsingular upper triangular (NUT) infinite matrix, U, and arbitrary elements γi ∈ Fb (with 1 ≤ i ≤ b), define U (i) by U (i) = γiU. (C.1) Then, the sequences produced by the generator matrices C (i) = P i−1 U (i) are (0, s)-sequences in the prime power base b ≥ s. • These (0, s)-sequences are a new generalization of Faure sequences. Their generator matrices are NUT, while the generator matrices of GFaure are not triangular, only the A(i) ’s. • The matrix U and the number γi can be chosen randomly. However, for best results they should be chosen empirically in order to obtain a good quality generator. C.1.3 Linear Scrambling Since the straightforward implementation of nested scrambling is very hard and inefficient in practice, Morohosi and Fushimi [30] gave a practical comparison among these scrambling methods in implementation and error estimation. They also pointed out that implementation of Owen scrambling is time-consuming in practice and has the same performance in error 70 estimation. Inspired by Tezuka’s scrambling, several researchers [21, 48, 66] gave proofs and implementations in terms of linear scrambling for (t, s)-sequences. When the Halton sequence is studied, the best way to break the correlations between dimensions is linear scrambling. The efficient way to scramble the Faure sequence is GFaure. Therefore, linear scrambling is the simplest and most effective scrambling method to improve the quality of quasirandom sequences. This is the reason why I focus on linear scrambling and try to look for the “best” scrambling in this linear space. In addition, linear permutations are easily implemented. C.2 Derandomization The main goal of the derandomization is to reduce the size of the scrambling space and to find a set of sequences with good quality and then use them in QMC or error estimation. Derandomization is not new idea since a version was proposed when poor 2-D projections in the Halton sequence were first reported [1]. Fixing these problems was not called derandomization then. Recently, FINDER [24, 25], a commercial software system which uses quasirandom sequences to solve problems in finance, is an example of the successful use of derandomization. There are two types of quasirandom sequences included in FINDER. One is GFaure and the other is a modified Soboĺ sequence. Although the creators of FINDER pointed out that the major improvements in their modified Soboĺ sequence were achieved via optimized initial direction numbers for dimension up to 360, the method they used for this improvement was not revealed, and FINDER was patented. As for GFaure in FINDER, recall that GFaure has a generator matrix which can be expressed as C (j) = A(j) P j−1, (1 ≤ j ≤ b), where A(j) is an arbitrary nonsingular lower triangular matrix over Fb , and P is the Pascal matrix. FINDER empirically chooses A(j) to optimize the simulation results. This is a typical example of applying derandomization. Thus one may think of derandomization as using some means, empirical or theoretical, to choose optimal parameters in a quasirandom number generator. Optimality is usually based on uniformity measures, but may be related to performance in a particular application. 71 C.2.1 Reasons to Derandomize Quasirandom sequences are deterministically generated and are constructed to be highly uniformly distributed. Although the use of quasirandom numbers in QMC leads to a faster convergence rate [9], it is by no means trivial to provide practical error estimates in QMC due to the fact that the only rigorous error bounds, provided via the Koksma-Hlawka inequality [8], are very hard to utilize in practice. In fact, the common procedure in MC of using a predetermined error criterion as a deterministic termination condition, is almost impossible to achieve in QMC without extra technology. The solution to this problem is to add randomness into quasirandom sequences by using various scrambling techniques. Unlike pseudorandom numbers, there are only a few common choices for quasirandom number generators. Randomizing quasirandom sequences gives us a stochastic family of sequences. Finding one optimal quasirandom sequence within this family can be quite useful for enhancing the performance of ordinary QMC. In addition, derandomizing quasirandom sequences not only seeks to find an optimal sequence within a scrambled family, but also at finding a set of such optimal sequences. These optimal families are useful in automatic error estimation. Randomized QMC provides an elegant approach to obtain error estimates for quasi-Monte Carlo based on treating each scrambled sequence as a different and independent random sample. When error estimation is used in practice, one just has to randomly choose several scrambled sequences from the whole scrambled family. The idea to derandomize is that one can find a set of optimal sequences within a family of scrambled sequence family, and use sequences within this set for error estimation. While I have seen the utility of this for automatic error estimation in QMC, the process of searching and specifying optimal quasirandom sequences that achieve theoretically and empirically optimal results is also an important problem in QMC. C.2.2 Examples The final goal of derandomization is to find optimal sequences within a family of scrambled sequences. Therefore, a criterion to measure this optimality has to be both computationally tractable and easy to implement. To better illustrate the ideas of derandomization, 72 I use examples to demonstrate that the derandomization of Owen scrambling is impractical, while the derandomization of GFaure and linear scrambling families is admissible. Owen scrambling is theoretically powerful for any (t, s)-sequence. However, the derandomization of Owen scrambling is inadmissible. This is due to two reasons: (1) the huge nested space implicit in Owen scrambling, and (2) the previously mentioned lack of an effective measurement criterion. For each scrambled digit, I have p! possible permutations, where p is the base. If one wants to scramble K digits of each quasirandom point, pK (p!) permutations have to be stored. Finally, Owen [20] pointed out that direct implementation of his nested scrambling is not practical for high dimensions. Therefore, nested scramblings are only used for comparison in this dissertation. The GFaure family provides a successful example of reducing scrambling space and finding optimal sequences. Derandomization has been done successfully with GFaure and now with Tezuka’s i-binomial scrambling [26]. Thus, I present an example based on finding optimal Faure sequences from the GFaure family. As described to Chapter 4, i-binomial scrambling [58] is an algorithm which considers only a reduced number of sequences within the GFaure family, while maintaining the original overall quality of the Faure sequence. Tezuka’s i-binomial scrambling [26] is a special case of GFaure and reduces the scrambling space from K 2 p to Kp, where K is the number of bits to be scrambled, and gives one a specific search criterion and smaller space in which to find optimal GFaure sequences. Following this lead, I focus on finding optimal Faure sequences from among the i-binomial scramblings instead of the entire GFaure family. The GFaure family is a good example of finding optimal scramblings within the linear scrambling space, as in this dissertation, linear scrambling is under consideration. We focus on finding optimal scramblings for the Halton, Faure and Soboĺ sequences in a linear scrambling space rather than the whole scrambling space. Linear scrambling is easy to implement and the size of this scrambling space is tractably small. The most important fact for my use of linear scrambling space is that I have a theoretical criterion, the extreme discrepancy based on a sample of continued fraction expansion[56], to measure optimal scramblings in this context. Note that in many other cases, there are no criteria to measure optimal scramblings. 73 The advantage of linear scrambling is to reduce the permutation for each digit from a set of p! permutations to a set of only p. This reduction makes the linear scrambling space smaller and easier to search for optimal scramblings. In this dissertation, I present new algorithms for finding the optimal Halton, Faure and Soboĺ sequences within this linear scrambling space. Numerical results of comparisons are presented in Figures D.1 -D12. From Tables 3.2 and Figures D.1–D.12, it is easily to see that linearly optimal sequences are stable with increasing dimension and number of samples and have better performance than the original sequences. Also, the original Faure and Soboĺ sequences have their own favorable number of points [6] based on powers of their bases. In contrast, the optimal sequences do not have these limitations. C.3 Conclusion There are only a few types of commonly used quasirandom sequences. However, derandomization techniques allow us to find optimal quasirandom sequences having similar attributes. Scrambled quasirandom sequences form a large stochastic family of quasirandom sequences. It is usually impossible to conduct an exhaustive search to find an optimal quasirandom sequence within such a family. However, it is often useful to find a group of sequences having smaller L2 -discrepancy or better uniformity. In practice, one tries to reduce the search space before undertaking such a computation. We find a criterion based on the extreme discrepancy to justify optimal scramblings from among the linear scrambling space. We use these optimal scrambling in a published set of test integrands described in the next appendix. These derandomized sequences are numerically tested and shown empirically to be far superior to the original sequences as well as randomly chosen scrambled sequences. 74 APPENDIX D ADDITIONAL NUMERICAL EXPERIMENTS The purpose of this appendix is twofold: to provide numerical experiments among different scrambling methods and to empirically verify optimality. Consider all scrambled Halton (Faure or Soboĺ) sequences as the comparison space of all possible sequences. Before one can randomly choose a scrambled Halton sequence, one has to choose a scrambling method. Linear scrambling is the main scrambling method used in this dissertation. In order to compare linear scrambling methods with the rest of the scrambling space, two other scrambling methods, randomized shifting and nested scrambling, are chosen for comparison. Note that these two scrambling methods are not linear scrambling methods. For each type of sequence, I chose ten sequences within each type and these sequences distributed for different dimensions and different integral functions. Clearly, there are other parts of the scrambled Halton space that are not considered here. However, without specific scrambling methods, I am not aware of how to “randomly” generate sequences in other parts of the scrambled Halton space, thus I restrict myself to compare with scrambling methods above. D.1 Test Functions High-dimensional integral problems are always a good way to test the quality of quasirandom sequences. A published set of test integrands [43, 99, 64, 100, 57] is a good way to test different scrambled sequences. Consider a class of test functions: I1 (f ) = 1 ... 0 I2 (f ) = 1 ... 0 1 0 1 0 s π i=1 2 (sin πx)dx1 . . . dxs = 1. s |4xi − 2| + ai i=1 1 + ai 75 dx1 . . . dxs = 1. (D.1) (D.2) where ai are parameters. Such functions allow an automatic tuning of the relative importance of the variables, as well as of their interactions, by appropriate choices of ai . The effective dimension can be computed and tabulated in [57]. The effective dimension is closely related to sensitivity indices [101]. Three choices of the parameters will be considered: 1. a1 = a2 = · · · = as = 0 2. a1 = a2 = · · · = as = i, for 1 ≤ i ≤ s 3. a1 = a2 = · · · = as = i2 , for 1 ≤ i ≤ s For the first choice of ai = 0, all variables are equally important. The effective dimension is approximately the real dimension, s. For the last two choices of ai , the importance of the successive variables is decreasing. The effective dimension is 10 for ai = i and 5 for ai = i2 . In general, when ai becomes bigger, the variables are decreasing quickly in importance and the effective dimension becomes smaller. D.2 Numerical Results For comparison purposes, I present numerical results for original, linearly optimal and two types of scrambled sequences, which are randomly chosen from among randomized shifting and nested scrambled families. D.2.1 Halton Sequnences Besides original Halton and optimal Halton sequence from the linear scrambling family, I choose randomized shifted Halton [4], and nested scrambled Halton sequences [3] for comparison. The numerical results are listed in Figures D.1 –D.4 and Tables D.1–D.2 for each integral function. • Halton refers to the original Halton sequence provided by Fox [5] • DHalton refers to the derandomized Halton sequence proposed in Section 3.7. • S1Halton refers to a randomly chosen sequence from random-start Halton sequences. 76 • S2Halton refers to a randomly chosen sequence from permuted Halton sequences In Figures D.1 and D.2, estimated values by using the original Halton sequence in dimension s = 25 and s = 40 are not included since these values are too large compared to 1. All these estimated values are listed in Tables D.1 and D.2. In Figure D.1, it is shown that derandomized Halton sequence in s = 40 (DHalton) performs a much better than randomly chosen scrambled sequences. However, DHalton has the same or a little better performance in Figure D.2 for s = 40. In Figures D.3 and D.4, I only show one picture for s = 40 since the other cases have the similar performance because the effective dimension of the integral function in (D.2) for ai = i and ai = i2 is low. In general, the derandomized Halton sequence tends to have better performance when the effective dimension is high. D.2.2 Faure Sequences Besides the original Faure and optimal Faure sequence from the linear scrambling family, I choose a scrambled sequence based on randomized shifting [66] and a nested scrambled Faure sequence [48]. The numerical results are listed in Figures D.5 –D.8 for each integral function. • Faure refers to the original Faure sequence provided by Fox [5] • DFaure refers to the derandomized Faure sequence proposed in Section 4.2. • S1Faure refers to a randomly chosen sequence from the randomized shifting family. • S2Faure refers to a randomly chosen sequence from the nested scrambled Faure sequences In Figure D.5, it is shown that derandomized Faure sequence in s = 40 (DFaure) has a better convergence rate than randomly chosen scrambled sequences. DFaure is stable in Figure D.6 for s = 40. In Figures D.7 and D.8, I only show one picture for s = 40 since the other cases have similar performance because the effective dimension of the integrand in (D.2) for ai = i and ai = i2 is low. In general, derandomized Faure sequences tend to have a faster convergence rate when the effective dimension is high. 77 D.2.3 Soboĺ Sequences Besides the original Soboĺ and optimal Soboĺ sequence from the linear scrambling family, I choose one scrambled Soboĺ sequence from among randomized shifts [66], and one from nested scrambled Soboĺ sequences [102]. The numerical results are listed in Figures D.9 –D.12 for each integral function. • Sobol refers to the original Soboĺ sequence provided by Fox [6] • DSobol refers to the derandomized Soboĺ sequence proposed in Section 5.3. • S1Sobol refers to a randomly chosen sequence from randomized shifted Soboĺ family. • S2Sobol refers to a randomly chosen sequence from nested scrambled Soboĺ sequences. In Figure D.9, it is shown that derandomized Soboĺ sequence in s = 40 (DSobol) has the same convergence rate as randomly chosen scrambled sequences. In Figure D.10 the derandomized Soboĺ sequence appears to have better performance than the other sequences. In Figures D.11 and D.12, I only show one picture for s = 40 since the other cases have the similar performance because the effective dimension of the integrand in (D.2) for ai = i and ai = i2 is low. As a genaral rule, the derandomized Soboĺ sequence tends to have a faster convergence rate when the effective dimension is high. D.3 Conclusion Clearly, effective dimension plays an important role when analyzing the performance of various quasirandom sequences. When effective dimension is low (below 10), I can not see any advantage to optimal sequences. However, whenever effective dimension is high (greater than 20), the merit of optimal sequences is relatively stable and provides faster convergence. 78 Table D.1. Estimates of I1 (f ) in (D.1) by using Halton sequences Generators N s = 13 s = 20 s = 25 s = 40 Halton DHalton S1Halton S2Halton 500 500 500 500 0.785 0.873 0.685 0.890 0.548 0.656 0.498 0.687 0.346 0.489 0.399 0.434 0.239 0.481 0.576 0.493 Halton DHalton S1Halton S2Halton 1000 1000 1000 1000 0.893 0.937 0.903 0.931 0.634 0.768 0.589 0.703 0.273 0.542 0.503 0.657 0.149 0.903 0.759 0.692 Halton DHalton S1Halton S2Halton 7000 7000 7000 7000 0.943 0.977 0.935 0.967 1.028 1.058 0.993 0.965 0.960 1.095 0.967 1.210 0.708 0.815 0.678 0.756 Halton DHalton S1Halton S2Halton 20,000 20,000 20,000 20,000 0.984 0.989 0.967 1.023 1.023 1.014 0.967 1.034 0.893 1.084 1.201 1.190 0.859 0.943 0.830 1.202 Halton DHalton S1Halton S2Halton 40,000 40,000 40,000 40,000 0.988 0.988 0.976 0.958 0.971 1.081 0.992 0.987 0.955 1.098 1.094 1.103 3.5260 0.940 0.760 1.326 Halton DHalton S1Halton S2Halton 100,000 100,000 100,000 100,000 0.996 0.995 1.02 1.09 0.982 1.013 1.034 1.092 0.984 1.055 1.11 1.003 2.03 0.978 0.876 1.230 79 Table D.2. Estimates of I2 (f ) in (D.2) with parameters ai = 0 by using Halton sequences Generators N s = 13 s = 20 s = 25 s = 40 Halton DHalton S1Halton S2Halton 500 500 500 500 1.303 0.799 0.641 0.578 3.129 0.786 0.702 0.698 68.549 0.498 0.401 0.456 1.363 × 106 0.432 0.201 0.445 Halton DHalton S1Halton S2Halton 1000 1000 1000 1000 1.171 0.875 0.673 0.739 2.324 0.601 0.734 0.769 34.513 0.612 0.721 0.658 6.814 × 105 0.311 0.421 0.399 Halton DHalton S1Halton S2Halton 7000 7000 7000 7000 0.922 0.942 0.953 0.902 0.998 1.216 0.934 0.899 5.782 1.742 0.711 0.659 9.734 × 104 0.489 0.789 0.403 Halton DHalton S1Halton S2Halton 20,000 20,000 20,000 20,000 0.977 0.982 0.956 0.890 0.939 1.118 0.923 0.915 2.876 1.327 0.765 0.835 3.407 × 104 0.689 0.578 0.674 Halton DHalton S1Halton S2Halton 40,000 40,000 40,000 40,000 0.974 1.014 1.032 0.941 0.889 1.183 1.121 0.984 1.796 1.380 1.203 1.120 1.704 × 104 0.732 0.654 1.309 Halton DHalton S1Halton S2Halton 100,000 100,000 100,000 100,000 0.985 1.008 0.953 0.991 0.897 1.068 0.972 0.982 1.242 1.003 0.923 1.028 6.815 × 103 1.102 0.674 1.327 80 1.5 1.5 1 0.5 0 –0.5 Camparisons among Halton sequences (s=20) 2 Estimated value Estimated value Camparisons among Halton sequences (s=13) 2 1 0.5 20000 40000 60000 80000 100000 Number of simulations 0 –0.5 Legend 20000 40000 60000 80000 100000 Number of simulations Legend Halton DHalton S1Halton S2Halton Halton DHalton S1Halton S2Halton 1.5 1.5 1 0.5 0 –0.5 Camparisons among Halton sequences (s=40) 2 Estimated value Estimated value Camparisons among Halton sequences (s=25) 2 1 0.5 20000 40000 60000 80000 100000 Number of simulations 0 –0.5 Legend 20000 40000 60000 80000 100000 Number of simulations Legend Halton DHalton S1Halton S2Halton Halton DHalton S1Halton S2Halton Figure D.1. Estimates of the integral I1 (f ) in (D.1) by using various Halton sequences. 81 1.5 1.5 1 0.5 0 –0.5 Camparisons among Halton sequences (s=20) 2 Estimated value Estimated value Camparisons among Halton sequences (s=13) 2 1 0.5 20000 40000 60000 80000 100000 Number of simulations 0 –0.5 Legend 20000 40000 60000 80000 100000 Number of simulations Legend Halton DHalton S1Halton S2Halton DHalton S1Halton S2Halton Exactvalue 1.5 1.5 1 0.5 0 –0.5 Camparisons among Halton sequences (s=40) 2 Estimated value Estimated value Camparisons among Halton sequences (s=25) 2 1 0.5 20000 40000 60000 80000 100000 Number of simulations 0 –0.5 Legend 20000 40000 60000 80000 100000 Number of simulations Legend DHalton S1Halton S2Halton Exactvalue DHalton S1Halton S2Halton Exactvalue Figure D.2. Estimates of the integral I2 (f ) in (D.2) with parameters ai = 0 by using various Halton sequences. 82 2 Camparisons among Halton sequences (s=40) Estimated value 1.5 1 0.5 0 20000 40000 60000 80000 Number of simulations 100000 –0.5 Legend Halton DHalton S1Halton S2Halton Exactvalue Figure D.3. Estimates of the integral I2 (f ) in (D.2) with parameters ai = i by using various Halton sequences. 83 2 Camparisons among Halton sequences (s=40) Estimated value 1.5 1 0.5 0 20000 40000 60000 80000 Number of simulations 100000 –0.5 Legend Halton DHalton S1Halton S2Halton Exactvalue Figure D.4. Estimates of the integral I2 (f ) in (D.2) with parameters ai = i2 by using various Halton sequences. 84 1.5 1.5 1 0.5 0 –0.5 Camparisons among Faure sequences (s=20) 2 Estimated value Estimated value Camparisons among Faure sequences (s=13) 2 1 0.5 20000 40000 60000 80000 100000 Number of simulations 0 –0.5 Legend 20000 40000 60000 80000 100000 Number of simulations Legend Faure DFaure S1Faure S2Faure Faure DFaure S1Faure S2Faure 1.5 1.5 1 0.5 0 –0.5 Camparisons among Faure sequences (s=40) 2 Estimated value Estimated value Camparisons among Faure sequences (s=25) 2 1 0.5 20000 40000 60000 80000 100000 Number of simulations 0 –0.5 Legend 20000 40000 60000 80000 100000 Number of simulations Legend Faure DFaure S1Faure S2Faure Faure DFaure S1Faure S2Faure Figure D.5. Estimates of the integral I1 (f ) in (D.1) by using various Faure sequences. 85 1.5 1.5 1 0.5 0 –0.5 Camparisons among Faure sequences (s=20) 2 Estimated value Estimated value Camparisons among Faure sequences (s=13) 2 1 0.5 20000 40000 60000 80000 100000 Number of simulations 0 –0.5 Legend 20000 40000 60000 80000 100000 Number of simulations Legend Faure DFaure S1Faure S2Faure Faure DFaure S1Faure S2Faure 1.5 1.5 1 0.5 0 –0.5 Camparisons among Faure sequences (s=40) 2 Estimated value Estimated value Camparisons among Faure sequences (s=25) 2 1 0.5 20000 40000 60000 80000 100000 Number of simulations 0 –0.5 Legend 20000 40000 60000 80000 100000 Number of simulations Legend Faure DFaure S1Faure S2Faure Faure DFaure S1Faure S2Faure Figure D.6. Estimates of the integral I2 (f ) in (D.2) with parameters ai = 0 by using various Faure sequences. 86 2 Camparisons among Faure sequences (s=40) Estimated value 1.5 1 0.5 0 20000 40000 60000 80000 Number of simulations 100000 –0.5 Legend Faure DFaure S1Faure S2Faure ExactValue Figure D.7. Estimates of the integral I2 (f ) in (D.2) with parameters ai = i by using various Faure sequences. 87 2 Camparisons among Faure sequences (s=40) Estimated value 1.5 1 0.5 0 20000 40000 60000 80000 Number of simulations 100000 –0.5 Legend Faure DFaure S1Faure S2Faure ExactValue Figure D.8. Estimates of the integral I2 (f ) in (D.2) with parameters ai = i2 by using various Faure sequences. 88 1.5 1.5 1 0.5 0 –0.5 Camparisons among Sobol sequences (s=20) 2 Estimated value Estimated value Camparisons among Sobol sequences (s=13) 2 1 0.5 20000 40000 60000 80000 100000 Number of simulations 0 –0.5 Legend 20000 40000 60000 80000 100000 Number of simulations Legend Sobol DSobol S1Sobol S2Sobol Sobol DSobol S1Sobol S2Sobol 1.5 1.5 1 0.5 0 –0.5 Camparisons among Sobol sequences (s=40) 2 Estimated value Estimated value Camparisons among Sobol sequences (s=25) 2 1 0.5 20000 40000 60000 80000 100000 Number of simulations 0 –0.5 Legend 20000 40000 60000 80000 100000 Number of simulations Legend Sobol DSobol S1Sobol S2Sobol Sobol DSobol S1Sobol S2Sobol Figure D.9. Estimates of the integral I1 (f ) in (D.1) by using various Soboĺ sequences. 89 1.5 1.5 1 0.5 0 –0.5 Camparisons among Sobol sequences (s=20) 2 Estimated value Estimated value Camparisons among Sobol sequences (s=13) 2 1 0.5 20000 40000 60000 80000 100000 Number of simulations 0 –0.5 Legend 20000 40000 60000 80000 100000 Number of simulations Legend Sobol DSobol S1Sobol S2Sobol Sobol DSobol S1Sobol S2Sobol 1.5 1.5 1 0.5 0 –0.5 Camparisons among Sobol sequences (s=40) 2 Estimated value Estimated value Camparisons among Sobol sequences (s=25) 2 1 0.5 20000 40000 60000 80000 100000 Number of simulations 0 –0.5 Legend 20000 40000 60000 80000 100000 Number of simulations Legend Sobol DSobol S1Sobol S2Sobol Sobol DSobol S1Sobol S2Sobol Figure D.10. Estimates of the integral I2 (f ) in (D.2) with parameters ai = 0 by using various Soboĺ sequences. 90 2 Camparisons among Sobol sequences (s=40) Estimated value 1.5 1 0.5 0 20000 40000 60000 80000 Number of simulations 100000 –0.5 Legend Sobol DSobol S1Sobol S2Sobol Exactvalue Figure D.11. Estimates of the integral I2 (f ) in (D.2) with parameters ai = i by using various Soboĺ sequences. 91 2 Camparisons among Sobol sequences (s=40) Estimated value 1.5 1 0.5 0 20000 40000 60000 80000 Number of simulations 100000 –0.5 Legend Sobol DSobol S1Sobol S2Sobol Exactvalue Figure D.12. Estimates of the integral I2 (f ) in (D.2) with parameters ai = i2 by using various Soboĺ sequences. 92 REFERENCES [1] E. Braaten and G. Weller. An improved low-discrepancy sequence for multidimensional quasi-Monte Carlo integration. Journal of Computational Physics, 33:249–258, 1979. [2] W.J. Morokoff and R.E. Caflisch. Quasirandom sequences and their discrepancy. SIAM Journal on Scientific Computing, 15:1251–1279, 1994. [3] L. Kocis and W. Whiten. Computational investigations of low discrepancy sequences. ACM Trans. Mathematical Software, 23:266–294, 1997. [4] X. Wang and F. Hickernell. Randomized Halton sequences. Math. Comput. Modelling, 32:887–899, 2000. [5] B. Fox. Implementation and relative efficiency of quasirandom sequence generators. ACM Trans. on Mathematical Software, 12:362–376, 1986. [6] P. Bratley and B. Fox. Algorithm 659: Implementing Soboĺ’s quasirandom sequence generator. ACM Trans. on Mathematical Software, 14(1):88–100, 1988. [7] S. Tezuka. Uniform Random Numbers, Theory and Practice. Publishers, IBM Japan, 1995. Kluwer Academic [8] H. Niederreiter. Random Number Generations and Quasi-Monte Carlo Methods. SIAM, Philadelphia, 1992. [9] J. Spanier and E. Maize. Quasirandom methods for estimating integrals using relatively small sampling. SIAM Review, 36:18–44, 1994. [10] P. L’Ecuyer and C. Lemieux. Recent advances in randomized quasi-Monte Carlo methods. Modelling Uncertainty: An Examination of Stochastic Theory, Methods, and Applications, 2002. [11] A. Owen. Randomly permuted (t, m, s)-nets and (t, s)-sequences. Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing, 106 in Lecture Notes in Statistics:299–317, 1995. [12] J. Halton. On the efficiency of certain quasirandom sequences of points in evaluating multidimensional integrals. Numerische Mathematik., 2:84–90, 1960. [13] I. M. Soboĺ. Uniformly distributed sequences with additional uniformity properties. USSR Comput. Math. and Math. Phy., 16:236–242, 1976. 93 [14] I. M. Soboĺ. Uniformly distributed sequences with additional uniformity properties. USSR Comput. Math. and Math. Phy., 16:236–242, 1976. [15] H. Faure. Discrepancy of sequences associated with a number system(in dimension s). Acta. Arith, 41(4):337–351, 1982[French]. [16] H. Niederreiter. Low-discrepancy and low-dispersion sequences. Journal of Number Theory, 30:51–70, 1988. [17] S. Tezuka. Polynomial arithmetic analogue of Halton sequences. ACM Trans. on Modelling and Computer Simulation, 3:99–107, 1993. [18] R. Caflisch. Monte Carlo and quasi-Monte Carlo methods. Acta Numerica, 7:1–49, 1998. [19] B. Tuffin. On the use of low-discrepancy sequences in Monte Carlo methods. Technical Report No.1060. IRISA. Resses,, 1996. [20] A. Owen. Variance and discrepancy with alternative scramblings. ACM Trans. on Computational Logic., V:1–16, 2002. [21] J. Matousek. On the L2 -discrepancy for anchored boxes. Journal of Complexity, 14:527–556, 1998. [22] A.B. Owen. Scrambled net variance for integrals of smooth functions. Annals of Statistics, 25:1541–1562, 1997. [23] N. Korobov. The approximate computation of multiple integrals. Dokl. Akad. Nauk. BSSR, 124:1207–1210, 1959. [24] S. H. Paskov and J. F. Traub. Faster valuation of financial derivatives. J. Portfolio Management, 22(1):113–120, Fall 1995. [25] A. Papageorgiou and J. Traub. Beating Monte Carlo. RISK, 9:63–65, 1997. [26] S. Tezuka. On randomization of generalized Faure sequences. RT0494:15 pages, 2002. Research Report, [27] H. Faure. Good permutation for extreme discrepancy. Journal of Number Theory, 41:47–56, 1992. [28] E. Atanassov. On the discrepancy of the Halton sequences. Mathematica Balkanize, 2003. [29] S. Tezuka. Quasi-Monte Carlo –discrepancy between theory and practice. Monte Carlo and Quasi-Monte Carlo Methods 2000, K-T. Fang, F.J. Hickernell and H. Niederreiter(Eds):124–140. [30] H. Morohosi and M. Fushimi. A practical approach to the error estimation of QMC integration. In Monte Carlo and Quasi-Monte Carlo Methods 1998, pages 156–189. Springer, 2000. 94 [31] A. Keller. A quasi-Monte Carlo algorithm for the global illumination problem in the radiosity setting. In H. Niederreiter and P. Shiue, editors, Lecture Notes in Statistics (Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing), volume 106, pages 239–251. Springer-Verlag, New York, NY, 1995. [32] C. Lecot and A. Koudiraty. Quasi-random simulation of linear kinetic equations. Journal of Complexity, 17:795–814, 2001. [33] M. Mascagni and A. Karaivanova. Matrix computations using quasirandom numbers. Springer Verlag Lecture Notes in Computer Science, 1988:552–559, 2000. [34] J. Cheng and M. Druzdzel. Computational investigation of low-discrepancy sequences in simulation algorithms for Bayesian networks. In Uncertainty in Artificial Intelligence: Proceedings of the Sixteenth Conference (UAI-2000), pages 72–81, San Francisco, CA, 2000. Morgan Kaufmann Publishers. [35] H. Chi, M. Mascagni, and T. Warnock. On the optimal Halton sequences. Submitted to Mathematics and Computers in Simulation, 2004. [36] M. Mascagni and H. Chi. On the scrambled Halton sequence. Monte Carlo Methods and Applications, in press, 2004. [37] M. Mascagni and H. Chi. Optimal quasi-Monte Carlo valuation of derivative securities. In M. Costantino and C. Brebbia, editors, Computational Finance and Its Applications, pages 177–185. WITPress, Southampton, Boston, 2004. [38] H. Chi and M. Mascagni. Applications of optimal Faure sequences to the valuation of Asian options. In J. Guo and I. Duncan, editors, Applied Actuarial Research Conference. Orlando, March 8-9, 2004. [39] M. Mascagni and H. Chi. A new algorithm for scrambling Soboĺ sequence. In H. Niederreiter and D. Talay, editors, Monte Carlo and quasi-Monte Carlo Methods. June 7-10, 2004 Juan-les-Pins, France. [40] M. Mascagni and H. Chi. Parallel linear congruential generators with Sophie-Germain moduli. Parallel Computing, To appear, 14 pages, 2004. [41] T. Warnock. Computational investigations of low discrepancy point sets. Applications of Number Theory to Numerical Analysis(S.K. Zaremba, ed.), Academic Press, New York:319–343, 1972. [42] S. Heinrich. Efficient algorithm for computing the L2 discrepancy. Mathematics of Computation., 65:1621–1633, 1996. [43] P. Davis and P. Rabinowitz. Methods of Numerical Integration. Academic Press, New York, 1984. [44] C. Joy, P. Boyle, and K.S. Tan. Quasi-Monte Carlo methods in numerical finance. Management Science, 42(6):926–938, 1996. 95 [45] R. Bouckaert. A stratified simulation scheme for inference in Bayesian belief network. Proceedings of the tenth conference on uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers, San Francisco, CA:110–117, 1994. [46] J. Shaw. A quasirandom approach to integration in Bayesian statistics. Annals of statistics, 16:895–914, 1988. [47] J. Liu. Monte Carlo Strategies in Scientific Computing. Springer, New York Berlin, 2001. [48] K. Tan and P. Boyle. Application of randomized low discrepancy sequences to the valuation of complex securities. Journal of Economic Dynamics and Control, 24:1747–1782, 2000. [49] P. Dagum and M. Luby. Approximating probabilistic inference in Bayesian belief networks is NP-hard. Artificial Intelligence, 60(1):141–153, 1993. [50] M. Henrion. Propagating uncertainty by logic sampling in Bayesian networks. Uncertainty in Artificial Intellegence, 2:317–324, 1988. [51] E. Castillo, J. Gutierrez, and A. Hadi. Expert Systems and Probabilistic Network Models. Springer, Spinger-Verlag New York, 1996. [52] G. Casella. Statistical Inference. Brooks/Cole Pub. Co, Pacific Grove, Calif, 1990. [53] J. Halton and G. Smith. Algorithm 247: radical-inverse quasi-random point sequence. Commun. ACM, 7:701–702, 1964. [54] R. Lidl and H. Niederreiter. Introduction to Finite Fields and Their Applications. Cambridge University Press, Cambridge, 1994. [55] D. E. Knuth. The Art of Computer Programming, vol. 2: Seminumerical Algorithms. Addison-Wesley, Reading, Massachusetts, 1997. [56] H. Niederreiter. Quasi-Monte Carlo methods and pseudo-random numbers. Bull. Amer. Math. Soc., 84:957–1041, 1978. [57] X. Wang and K. Fang. The effective dimension and quasi-Monte Carlo. Journal of Complexity, 19(2):101–124, 2003. [58] S. Tezuka and H. Faure. I-binomial scrambling of digital nets and sequences. Journal of Complexity, Accepted:14 pages, 2003. [59] P. Boyle. New life forms on the option landscape. Journal of Financial Engineering, 2(3):217–252, 1992. [60] F. Black and M. Scholes. The pricing of options and corporate liabilities. Journal of Political Economy, 81:637–659, 1973. [61] J. Hull. Options, Future and Other Derivative Securities. Prentice-Hall, New York, 2000. 96 [62] B. Moro. The full Monte. Risk, 8(2) (February):57–58, 1995. [63] I. Antonov and V. Saleev. An economic method of computing LP-sequences. U.S.S.R. Computational Mathematics and Mathematical Physics, 19:252–256, 1979. [64] S. Joe and F. Kuo. Remark on Algorithm 659: Implementing Soboĺ’s quasirandom sequence generator. ACM Transactions on Mathematical Software, 29(1):49–57, March 2003. [65] P. Jackel. Monte Carlo Methods in Finance. John Wiley and Sons, New York, 2002. [66] H. Hong and F. Hickernell. Algorithm 823: Implementing scrambled digital sequences. ACM Transactions on Mathematical Software, 29(2):95–109, June 2003. [67] M. Mascagni. Parallel linear congruential generators with prime moduli. Parallel Computing, 24:923–936, 1998. [68] G. Fishman and L. Moore. An exhaustive analysis of multiplicative congruential random number generators with modulus 231 −1. SIAM J. Sci. Stat. Comput., 7:24–45, 1986. [69] F. Hickernell. Extensible lattice sequences for quasi-Monte Carlo quadrature. SIAM Journal of Sci. Comput., 22 No.3:1117–1138, 2000. [70] R. Hua and Y. Wang. Applications of Number Theory to Numerical Analysis. Springer, Berlin, 1981. [71] I. Sloan and S. Joe. Lattice Methods for Multiple Integration. Claren Press, Oxford, 1994. [72] I. Sloan and H. Wozniakowski. Tractability of multivariate integration for weighted Korobov classes. Journal of Complexity, To appear, 2002. [73] R. Cranley and T. Patterson. Randomization of number theoretic methods for multiple integration. SIAM Journal of Numerical Analysis, 13(6):904–914, 1976. [74] S. Joe. An average L2 discrepancy for number-theoretic rules. SIAM J. Numer. Anal, to appear, 2002. [75] F. James, J. Hoogland, and R. Kleiss. Multidimensional sampling for simulation and integration: Measure, discrepancies, and quasi-random numbers. Computer Physics Communications, 99:180–220, 1997. [76] A.B. Owen. Monte Carlo variance of scrambled equidistribution quadrature. SIAM Journal on Numerical Analysis, 34(5):1884–1910, 1997. [77] W. Synder. Accuracy estimation for quasi-Monte Carlo simulations. Mathematics and Computers in Simulation, 54:131–143, 2000. [78] A. Owen. Latin supercube sampling for very high dimensional simulations. ACM Trans. on Modelling and Computer Simulation, 8:71–102, 1998. 97 [79] B.C. Bromley. Quasirandom number generators for parallel Monte Carlo algorithms. J Parallel Distr. Com., 38(1):101–104, 1996. [80] W. Schmid and A. Uhl. Techniques for parallel quasi-Monte Carlo integration with digital sequences and associated problems. Math. Comput. Simulat, 55(1-3):249–257, 2001. [81] W. Schmid and A. Uhl. Parallel quasi-Monte Carlo integration using (t, s)-sequences. Lect. Notes Comput. Sc., 1557:96–106, 1999. [82] J. Li and G. Mullen. Parallel computing of a quasi-Monte Carlo algorithm for valuing derivatives. Parallel Compuing, 26:641–653, 2000. [83] G. Okten and A. Srinivasan. Parallel quasi-Monte Carlo methods on a heterogeneous cluster. Monte Carlo and Quasi-Monte Carlo Methods 2000, 139 in Lecture Notes in Statistics:407–421, 2002. [84] Y. Li and M. Mascagni. Analysis of large-scale grid-based Monte Carlo applications. International Journal of High Performance Computing Applications, Accepted for a special issue, 2003. [85] M. Mascagni and A. Srinivasan. Algorithm 806: SPRNG: A scalable library for pseudorandom number generation. ACM Transactions on Mathematical Software, 26:436–461, 2000. [86] F. Hickernell. Goodness-of-fit statistics, discrepancy and robust designs. Statistics and Probability letters, 44:73–78, 1999. [87] R. D’Agostino and M. Stephens. Goodness-of-fit Techniques. Marcel Dekker, Inc., New York and Basel, 1986. [88] J. Liang, K. Fang, F. Hichernell, and R. Li. Testing multivariate uniformity and its applications. Math. Comp, 70:337–355, 2001. [89] F. Hickernell. A generalized discrepancy and quadrature error bound. Mathematics of Computation, 67:299–322, 1998. [90] SPRNG. Scalable parallel random number generators. http://sprng.cs.fsu.edu. [91] S. Tezuka. A note on polynomial arithmetic analogue of Halton sequences. ACM Trans. on Modelling and Computer Simulation, 4:279–284, 1994. [92] P.D. Coddington. Random number generators for parallel http://nhse.cs.rice.edu/nhsereview. The NHSE Review 2, 1996. computers. [93] A. DeMatteis and S. Pagnutti. Parallelization of random number generators and long-range correlations. Parallel Computing, 15:155–164, 1990. [94] M. Matsumoto and T. Nishimura. Dynamic creation of pseudorandom number generators. In Monte Carlo and Quasi-Monte Carlo Methods 1998, pages 56–69. Springer, 2000. 98 [95] H. Leeb. plab – a system for generating and testing random numbers. report no. 3, plab – reports. http://random.mat.sbg.ac.at/team/. University of Salzburg, 1997. [96] M. Marsaglia. The diehard battery http://stat.fsu.edu/pub/diehard. 1987. of tests of randomness [97] B. Fox. Strategies for Quasi-Monte Carlo. Knower Academic, Boston,MA, 1999. [98] I. Friezes and A. Keller. Fast generation of randomized low discrepancy point sets. In K. Fang, F. Hickernell, and H. Niederreiter, editors, Monte Carlo and Quasi-Monte Carlo Methods 2000, pages 257–273. Springer, 2002. [99] A. Genz. The numerical evaluation of multiple integrals on parallel computers. In P. Keast and G. Fairweather, editors, Numerical Intergration, pages 219–230. Dordrecht, Hooland, 1987. [100] I. Radovic, I. M. Soboĺ, and R. Tichy. Quasi-Monte Carlo methods for numerical integration: Comparison of different low discrepancy sequences. Monte Carlo Methods and Appl., 2:1–14, 1996. [101] I. M. Soboĺ. Global sensitivity indices for nonlinear mathematical models and their monte carlo estimates. Mathematics and Computers in Simulation, 55(1-3):271–280, 2001. [102] A. Owen. Scrambling Soboĺ and Niederreiter-Xing points. Journal of Complexity, 14(4):466–489, 1998. 99 BIOGRAPHICAL SKETCH Hongmei Chi Hongmei Chi was born in Dalian, China. She remained there and received a high school diploma in 1985. She earned a Bachelor of Science in Statistics from Nankai University in 1989 and earned a Master of Science in Computer Science from Dalian University of Technique in 1992. She received a Ph.D. degree in Computer Science from Florida State University in 2004. Her research interests span many areas related to scientific computing. Her current research focuses on the areas of quasi-Monte Carlo methods, computational finance and distributed or grid computing. 100