Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
RANDOM NUMBERS, MONTE CARLO METHODS MA/CSC 428, SPRING 2017 NUMERICAL MATH II. SAUER: CHAPTER 9. TOO MANY APPLICATIONS TO LIST 1. Simulation of natural, physical, and other complicated phenomena (particles, people, traffic, …) 2. Sampling from an impractically large number of possible cases/scenarios to approximate “typical” or “average” behavior. 3. Numerical integration can be seen as a special case of this. 4. Computer science. For many (deterministic) algorithmic problems, randomized algorithms provide the fastest (and sometimes the only known efficient) solution. (Ex: testing primes.) 5. Secure communication, cryptography, privacy. 6. Computer-generated imagery (CGI). 7. Gambling and other recreation (duh). Different applications utilize different properties of “random” numbers. 30 WHAT IS A RANDOM NUMBER? Is 2.71947 a random number? 31 WHAT IS A RANDOM NUMBER? Is 2.71947 a random number? Not at all, but it was (seriously!) randomly generated (by me). It’s easier to meaningfully talk about a random set or sequence of numbers following a particular probability distribution and about random generation of numbers. • The precise math details are hairy (421); we won’t need them. Some intuitive interpretations of “random”: • Uniformity: each number or item (from some domain or set) is equally likely. More generally: the likelihood of different outcomes is fixed, but not the outcomes (bell curve, etc.) • Unpredictability: even looking at the initial numbers in a sequence, we don’t know what comes next. • Probability theory/statistics terminology: INDEPENDENT, IDENTICALLY DISTRIBUTED (or IID) random variables. 32 (PSEUDO-)RANDOM NUMBER SEQUENCES What we usually need in numerical methods is • • • a sequence of independent, and identically distributed numbers from some domain, usually {0,1} or [0,1] or ℝ. Computers rarely do truly random things, but for numerical math applications we are usually happy with “random looking”, deterministic, pseudo-random sequences. • Uniformity (or following a prescribed distribution) is key • True unpredictability is not so important, but at least apparent statistical independence is. (This is very different from some other applications, like cryptography, where strong independence is essential!) 33 (PSEUDO-)RANDOM NUMBER SEQUENCES 1. Uniformity is easy, independence is hard to achieve. • (A quick puzzle: you have a biased coin. How can you generate unbiased heads/tails with it?) 34 (PSEUDO-)RANDOM NUMBER SEQUENCES 1. Uniformity is easy, independence is hard to achieve. • (A quick puzzle: you have a biased coin. How can you generate unbiased heads/tails with it?) 2. True uniform random is often not “as uniform as it could be” 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 35 TWO SIMPLE RNGS (Note: nobody uses these.) The “middle-square” method (Neumann) • Let’s say we’d like “random” 4-digit numbers. 1. Start with any 4-digit number (seed), 2. Square it, take the middle 4 digits as the next number, 3. Repeat. • Example: seed=2017. 2017: = 04068289; 682: = 00465124, ; 4651: = 21631801, etc. • 682, 4651, 6318, 9171, 1072, 1491, 2230, 9729, 6534, 6931, 387, 1497, 2410, 8081, 3025, 1506, 2680, 1824, 3269, 6863, … • Looks random enough…(?) • Their magnitudes are all over the place. • 9 even and 11 odd numbers. 36 TWO SIMPLE RNGS (Note: nobody uses these.) The linear congruential generator • • • • Let’s say we’d like “random” integers from 0 to m-1. Uses two parameters, integers a and b. Start with any integer 𝑥] from 0 to m (seed). 𝑥P = 𝑎 𝑥P<J + 𝑏 (mod 𝑚) • Example: seed=1, 𝑎 = 201, 𝑏 = 7, 𝑚 = 2017 (a prime) • 208, 1475, 2000, 624, 377, 1155, 207, 1274, 1939, 465, 690, 1541, 1147, 616, 786, 667, 952, 1761, 993, 1934, … 37 TWO SIMPLE RNGS Some common features/bugs they share with more advanced (pseudo-)random number generators: • They are, of course, not random, completely deterministic, • Require a seed (initialization), • periodic, at least eventually. • Can be easily modified to generate random bits or uniform random (“real”) numbers from [0,1]. • The two most common applications, the next two being the normal and exponential distributions, which are a bit more complicated to do. • Q: Will they be identically distributed? Will they be independent? 38 MATLAB • rng(seed) seeds the random number generator using the nonnegative integer seed so that rand, randi, and randn produce a predictable sequence of numbers. • • • rand: uniform real in [0,1]. randi(imax): uniform integer between 1 and imax. randn: standard normal (mean 0, variance 1). • Alternative syntax available to generate arrays of random numbers, including sparse arrays, random complex numbers, random permutations, etc. See the Matlab help (e.g., type “doc rand” for more details). • • Reading and exercises: Sec. 9.1, including Exercises and Computer Problems. 39 MONTE CARLO SIMULATION AND INTEGRATION Simulation: • Generate “representative” samples from a set of possible scenarios. • Simplest case: every scenario is equally likely. • Simplest definition of representative: choose them with any (pseudo-) RNG that covers the scenario space uniformly. Application in integration: • Remember this? Q K 𝑓 𝑥 𝑑𝑥 ≈ N 𝑤P 𝑓 𝑥P c PRJ • Let 𝑥P be uniform random, and 𝑤P = J ∫ 1𝑑𝑥. Q c 40 MONTE CARLO INTEGRATION J Our previous example: compute ∫<J sin: 10𝜋 𝑥 + 1 𝑑𝑥 . • Make sure everything in the code below makes sense: f n xs int = = = = @(x)(sin(10*pi*x+1).^2); 100; % number of samples 2*rand(n,1)-1; % uniform in [-1,1] sum(f(xs))*(2/n) % each weight is 2/n • Using m=10, 100, 1000, …, 108 (after rng(2017) ), I got int = 0.9151, 1.1658, 1.0258, 0.9972, 1.0007, 1.0004, 0.9999, 0.9998. Questions: • The error is, obviously, random. Can we quantify it? • Why is this better than the quadrature formulas from 427? 41 MONTE CARLO INTEGRATION Why is this better than the quadrature formulas from 427? 1. Generalizes effortlessly to multivariate integration and (with some work) to integration over complicated domains. Q K 𝑓 𝑥 𝑑𝑥 ≈ N 𝑤P 𝑓 𝑥P , 𝑤P = c PRJ 1 K 1 𝑑𝑥 𝑛 c There’s nothing here about ℝ, except that we need to somehow know the area (measure) of the domain X. 1.0 • E.g., what is the area of the shaded region, defined by J d 0.5 0.0 J ≤ 𝑥 : + 𝑦 : ≤ 1 and ≤ 𝑥 : + 𝑦 : − 𝑥 𝑦 ≤ 1 ? d -0.5 -1.0 -1.0 -0.5 0.0 0.5 1.0 42 MONTE CARLO INTEGRATION • What is the area of the shaded region? 1 1 ≤ 𝑥 : + 𝑦 : ≤ 1 and ≤ 𝑥 : + 𝑦 : − 𝑥 𝑦 ≤ 1 3 3 1.0 • Monte Carlo answer: generate a number of random samples from the enclosing square • 𝑠ℎ𝑎𝑑𝑒𝑑 𝑎𝑟𝑒𝑎 = 0.5 # i7j PQ jk@lml # no 7n7@p inPQ7j 0.0 • Note that this is an RNG for uniform distribution over the shaded area. • This is called the acceptance-rejection method. -0.5 -1.0 -1.0 -0.5 0.0 0.5 1.0 43 MONTE CARLO INTEGRATION • Why random? Why not just use a grid? 1.0 0.5 0.0 -0.5 -1.0 -1.0 -0.5 0.0 0.5 1.0 44 MONTE CARLO INTEGRATION • Why random? Why not just use a grid? 1.0 1. Random samples are easier to scale and make adaptive (e.g,. compare a 0.5 10x10 grid to a 11x11 grid). 2. With a full grid in higher dimensions, we quickly run out of points (≥2d)! 0.0 -0.5 3. Repeated experiments provide statistical bounds on the error. -1.0 -1.0 -0.5 0.0 0.5 1.0 45 QUICK RECAP I. Monte Carlo integration • The world’s simplest integration algorithm: Q K 𝑓 𝑥 𝑑𝑥 ≈ N 𝑤P 𝑓 𝑥P c PRJ where 1 𝑤P = K 1𝑑𝑥 𝑛 c and the 𝑥P are uniform random points from X. • If X is complicated, we can generate these random points by generating uniform random points from a simpler set containing X (e.g., a box), and just drop the ones outside of X. • With this method, as we are generating 𝑥P , we also get the estimate ∫c 1𝑑𝑥 # 𝑝𝑡𝑠 𝑖𝑛 𝑋 ≈ # 𝑝𝑜𝑖𝑛𝑡𝑠 ∫=nr 1𝑑𝑥 46 QUICK RECAP II. Good, common (pseudo-)random number generators are • Not random, but deterministic. • The sequence is a function of the “seed” number. [rng()] • Produce a periodic sequence of numbers (which is OK as long as the period is really high). • Are “apparently” random (pass simple statistical tests). “Good” random number generation is quite difficult. (We won’t be able to discuss the details.) • For simple experiments, the Linear Congruential Generator with a very long period is OK. • But just use rand(). 47 MONTE CARLO INTEGRATION • Why random? Why not just use a grid? 1. Random samples are easier to scale and make adaptive (e.g,. compare a 10x10 grid to a 11x11 grid). 2. With a full grid in higher dimensions, we quickly run out of points (≥2d)! 3. Repeated experiments provide statistical bounds on the error. 48 ERROR ESTIMATION IN MONTE CARLO Monte Carlo is an application of the law of large numbers. • Informally, “the sample average converges to the mean.” Monte Carlo is also an application of the central limit theorem. • The average of iid samples approaches a normal distribution, regardless of the distribution of the samples. (Under mild assumptions.) 49 ERROR ESTIMATION IN MONTE CARLO Monte Carlo is an application of the law of large numbers. • Informally, “the sample average converges to the mean.” Monte Carlo is also an application of the central limit theorem. • The average of iid samples approaches a normal distribution. The same is applicable to the errors: • The sample average of the error converges to zero in a “predictable” way: • 𝑛 vw x⋯xvz Q average error − 0 → 𝑁(0, 𝜎 : ) mean error the variance of Ei (problem dependent!) the “error of errors” 50 ERROR ESTIMATION IN MONTE CARLO The order of magnitude of the error • Rearranging 𝑛J/: vw x⋯xvz Q − 0 = 𝑐𝑜𝑛𝑠𝑡, • The error from an n-sample estimate of the integral, is proportional to 𝑛<J/: . vw x⋯xvz , Q • Rearranging further: achieves an error <e using a number of samples proportional to 1/𝜀 : . • Q1: translate this to normal words? 51 ERROR ESTIMATION IN MONTE CARLO The order of magnitude of the error • Rearranging 𝑛J/: vw x⋯xvz Q − 0 = 𝑐𝑜𝑛𝑠𝑡, • The error from an n-sample estimate of the integral, is proportional to 𝑛<J/: . vw x⋯xvz , Q • Rearranging further: achieves an error <e using a number of samples proportional to 1/𝜀 : . • Q1: translate this to normal words? • Q2: is this a positive or negative result? J 52 ERROR ESTIMATION IN MONTE CARLO The order of magnitude of the error • Rearranging 𝑛J/: vw x⋯xvz Q − 0 = 𝑐𝑜𝑛𝑠𝑡, • The error from an n-sample estimate of the integral, is proportional to 𝑛<J/: . vw x⋯xvz , Q • Rearranging further: achieves an error <e using a number of samples proportional to 1/𝜀 : . • Bad: Won’t yield tiny errors with few samples. Needs 100x more samples for a single extra correct digit! • Good: the error is dimension-independent! 53 ERROR ESTIMATION IN MONTE CARLO The sample average only gives a single number (“point estimate”) for the approximate integral. • This means we don’t know how wrong or right the answer is. A further application of the CLT and some statistics also yields confidence intervals on the integral. (Try to do that with grids!) • Main idea: group the samples into batches, calculate a set of estimates, and based on their variability get a grip on the (unknown) variance s2. • Take a deep breath… 54 ERROR ESTIMATION IN MONTE CARLO A further application of the CLT and some statistics also yields confidence intervals on the integral. • Main idea: group the samples into batches, calculate a set of estimates, and based on their variability get a grip on the (unknown) variance s2. Theorem. If 𝑋J , … , 𝑋• ~ 𝑁(𝜇, 𝜎 : ) iid normal, then sample mean 𝑋„ = cw x⋯xc… †8 ~ 𝑁(𝜇, ) is also • • J ∑• deviation is 𝑆 = •<J PRJ normal. Moreover, if the sample standard 𝑋P − 𝑋„ : , then 𝑋„ − 𝜇 𝑇= ~ 𝑡•<J 𝑆/ 𝑚 Student’s t-distribution with 𝑚 − 1 degrees of freedom [The point is, this is a concrete, exactly known distribution, unlike 𝑁 0, 𝜎 : ]. 55 ERROR ESTIMATION IN MONTE CARLO Rearranging the final formula: Corollary. A (1 − 𝑎)-confidence interval on µ is 𝑋„ ± 𝑡‹/:,•<J (𝑆/ 𝑚), where 𝑡‹/:,•<J denotes the inverse CDF of the t distribution with 𝑚 − 1 degrees of freedom. 20 There is no nice formula for the t values, but code and tables of important values are available. Matlab: tinv(a/2,m-1) 15 10 5 0.2 0.4 https://onlinecourses.science.psu.edu/stat414/node/194 0.6 0.8 1.0 56 ERROR ESTIMATION IN MONTE CARLO Example. (Phew…!) 1.0 • A confidence interval for p. • MonteCarlo_pi.m 0.5 0.0 -0.5 -1.0 -1.0 -0.5 0.0 0.5 1.0 57 QUASI-MONTE CARLO (QMC) Recall the pros and cons of (regular) grids: • • • It covers a rectangular domain nicely. (Really “uniform”.) Can’t do adaptive/nested formulas easily. Scales horribly with the dimension. Recall the pros and cons of Monte Carlo: • • • Not really uniform, even if “uniform random”. Can easily add any number of points. Error is independent of the dimension. Quasi-Monte Carlo: regular(ish) deterministic infinite sequence whose initial finite subsequences cover the domain evenly. 58 QUASI-MONTE CARLO (QMC) Quasi-Monte Carlo: regular(ish) deterministic infinite sequence whose initial finite subsequences cover the domain evenly. • Example in 1D (source: http://planning.cs.uiuc.edu/node196.html): • Quite regular and uniform • Can you see the pattern? • Called van der Corput sequence 59 QMC: VAN DER CORPUT 60 QMC: HALTON SEQUENCE • The van der Corput sequence only works in 1D. • If we use a (perhaps randomly shifted) vdC sequence in each dimension, the coordinates are too obviously correlated. (Not uniform.) • Halton: use a different base for vdC along each coordinate. 7/8 1/16 8/9 5/9 2/9 4/9 1/9 2/3 1/3 7/9 T 8 3/8 T 6 5/8 T 7 1/8 T 4 3/4 T 5 1/4 T 3 1/2 T 1 1 x T 2 0 0 y 1 1/16/17, 22:21 Source: wiki 61 MC VS QMC 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0.2 0.4 0.6 0.8 1 0 0 0.2 0.4 0.6 0.8 1 Which one is which? 62 QMC: THE THEORY The vdC and Halton sequences are examples of what is called a low-discrepancy sequence. • Informally, the discrepancy of a point set measures how uniformly the set covers the domain. The discrepancy of an infinite sequence 𝑥J , 𝑥: , … measures the asymptotic discrepancy of the finite subsequences 𝑥J , … , 𝑥Q as n grows. • • Once again, the details are somewhat hairy. (See in a minute). Short ref for the theory-inclined: • https://www.uibk.ac.at/mathematik/personal/einkemmer/qmm.pdf 63 QMC: THE THEORY Discrepancy of a point set 𝑥J , … , 𝑥Q in 0,1 l : # inPQ7j r‘ PQ • Q •∈v 𝐷 (𝑥J , … , 𝑥Q ) = sup − vol(𝐽) , where E is the set of rectangular boxes in 0,1 l . • Intuitive measure of non-uniformity, but ultimately not so useful. Instead, we have this: Star-discrepancy (D*): the same as above, but E is the set of rectangular boxes in 0,1 l with 0 as a vertex. • Example: try and convince yourself that D* of the vdC sequence –—˜ Q is Θ( ). Q 64