Download RANDOM NUMBERS, MONTE CARLO METHODS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Addition wikipedia , lookup

Large numbers wikipedia , lookup

Infinite monkey theorem wikipedia , lookup

Law of large numbers wikipedia , lookup

Transcript
RANDOM NUMBERS,
MONTE CARLO
METHODS
MA/CSC 428, SPRING 2017
NUMERICAL MATH II.
SAUER: CHAPTER 9.
TOO MANY APPLICATIONS TO LIST
1.
Simulation of natural, physical, and other complicated
phenomena (particles, people, traffic, …)
2.
Sampling from an impractically large number of possible
cases/scenarios to approximate “typical” or “average” behavior.
3.
Numerical integration can be seen as a special case of this.
4.
Computer science. For many (deterministic) algorithmic
problems, randomized algorithms provide the fastest (and
sometimes the only known efficient) solution. (Ex: testing
primes.)
5.
Secure communication, cryptography, privacy.
6.
Computer-generated imagery (CGI).
7.
Gambling and other recreation (duh).
Different applications utilize different properties of “random”
numbers.
30
WHAT IS A RANDOM NUMBER?
Is 2.71947 a random number?
31
WHAT IS A RANDOM NUMBER?
Is 2.71947 a random number? Not at all, but it was (seriously!)
randomly generated (by me).
It’s easier to meaningfully talk about a random set or sequence of
numbers following a particular probability distribution and about
random generation of numbers.
• The precise math details are hairy (421); we won’t need them.
Some intuitive interpretations of “random”:
• Uniformity: each number or item (from some domain or set) is
equally likely. More generally: the likelihood of different
outcomes is fixed, but not the outcomes (bell curve, etc.)
• Unpredictability: even looking at the initial numbers in a
sequence, we don’t know what comes next.
• Probability theory/statistics terminology: INDEPENDENT,
IDENTICALLY DISTRIBUTED (or IID) random variables.
32
(PSEUDO-­)RANDOM
NUMBER SEQUENCES
What we usually need in numerical methods is
•
•
•
a sequence of independent, and
identically distributed
numbers from some domain, usually {0,1} or [0,1] or ℝ.
Computers rarely do truly random things, but for numerical math
applications we are usually happy with “random looking”,
deterministic, pseudo-random sequences.
• Uniformity (or following a prescribed distribution) is key
• True unpredictability is not so important, but at least apparent
statistical independence is. (This is very different from some
other applications, like cryptography, where strong
independence is essential!)
33
(PSEUDO-­)RANDOM
NUMBER SEQUENCES
1. Uniformity is easy, independence is hard to achieve.
• (A quick puzzle: you have a biased coin. How can you generate
unbiased heads/tails with it?)
34
(PSEUDO-­)RANDOM
NUMBER SEQUENCES
1. Uniformity is easy, independence is hard to achieve.
• (A quick puzzle: you have a biased coin. How can you generate
unbiased heads/tails with it?)
2. True uniform random is often not “as uniform as it could be”
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
35
TWO SIMPLE RNGS
(Note: nobody uses these.)
The “middle-square” method (Neumann)
• Let’s say we’d like “random” 4-digit numbers.
1. Start with any 4-digit number (seed),
2. Square it, take the middle 4 digits as the next number,
3. Repeat.
• Example: seed=2017. 2017: = 04068289; 682: =
00465124, ; 4651: = 21631801, etc.
• 682, 4651, 6318, 9171, 1072, 1491, 2230, 9729, 6534, 6931, 387,
1497, 2410, 8081, 3025, 1506, 2680, 1824, 3269, 6863, …
• Looks random enough…(?)
• Their magnitudes are all over the place.
• 9 even and 11 odd numbers.
36
TWO SIMPLE RNGS
(Note: nobody uses these.)
The linear congruential generator
•
•
•
•
Let’s say we’d like “random” integers from 0 to m-1.
Uses two parameters, integers a and b.
Start with any integer 𝑥] from 0 to m (seed).
𝑥P = 𝑎 𝑥P<J + 𝑏 (mod 𝑚)
• Example: seed=1, 𝑎 = 201, 𝑏 = 7, 𝑚 = 2017 (a prime)
• 208, 1475, 2000, 624, 377, 1155, 207, 1274, 1939, 465, 690,
1541, 1147, 616, 786, 667, 952, 1761, 993, 1934, …
37
TWO SIMPLE RNGS
Some common features/bugs they share with more advanced
(pseudo-)random number generators:
• They are, of course, not random, completely deterministic,
• Require a seed (initialization),
• periodic, at least eventually.
• Can be easily modified to generate random bits or uniform
random (“real”) numbers from [0,1].
• The two most common applications, the next two being the
normal and exponential distributions, which are a bit more
complicated to do.
• Q: Will they be identically distributed? Will they be
independent?
38
MATLAB
•
rng(seed) seeds the random number generator using the
nonnegative integer seed so that rand, randi, and randn
produce a predictable sequence of numbers.
•
•
•
rand: uniform real in [0,1].
randi(imax): uniform integer between 1 and imax.
randn: standard normal (mean 0, variance 1).
•
Alternative syntax available to generate arrays of random
numbers, including sparse arrays, random complex numbers,
random permutations, etc.
See the Matlab help (e.g., type “doc rand” for more details).
•
•
Reading and exercises: Sec. 9.1, including Exercises and
Computer Problems.
39
MONTE CARLO SIMULATION
AND INTEGRATION
Simulation:
• Generate “representative” samples from a set of possible
scenarios.
• Simplest case: every scenario is equally likely.
• Simplest definition of representative: choose them with any
(pseudo-) RNG that covers the scenario space uniformly.
Application in integration:
• Remember this?
Q
K 𝑓 𝑥 𝑑𝑥 ≈ N 𝑤P 𝑓 𝑥P
c
PRJ
• Let 𝑥P be uniform random, and 𝑤P
=
J
∫ 1𝑑𝑥.
Q c
40
MONTE CARLO INTEGRATION
J
Our previous example: compute ∫<J sin: 10𝜋 𝑥 + 1 𝑑𝑥 .
• Make sure everything in the code below makes sense:
f
n
xs
int
=
=
=
=
@(x)(sin(10*pi*x+1).^2);
100;
% number of samples
2*rand(n,1)-1;
% uniform in [-1,1]
sum(f(xs))*(2/n) % each weight is 2/n
• Using m=10, 100, 1000, …, 108 (after rng(2017) ), I got
int = 0.9151, 1.1658, 1.0258, 0.9972, 1.0007, 1.0004, 0.9999, 0.9998.
Questions:
• The error is, obviously, random. Can we quantify it?
• Why is this better than the quadrature formulas from 427?
41
MONTE CARLO INTEGRATION
Why is this better than the quadrature formulas from 427?
1. Generalizes effortlessly to multivariate integration and (with
some work) to integration over complicated domains.
Q
K 𝑓 𝑥 𝑑𝑥 ≈ N 𝑤P 𝑓 𝑥P , 𝑤P =
c
PRJ
1
K 1 𝑑𝑥
𝑛 c
There’s nothing here about ℝ, except that we need to somehow
know the area (measure) of the domain X.
1.0
• E.g., what is the area of the shaded region,
defined by
J
d
0.5
0.0
J
≤ 𝑥 : + 𝑦 : ≤ 1 and ≤ 𝑥 : + 𝑦 : − 𝑥 𝑦 ≤ 1 ?
d
-0.5
-1.0
-1.0
-0.5
0.0
0.5
1.0
42
MONTE CARLO INTEGRATION
• What is the area of the shaded
region?
1
1
≤ 𝑥 : + 𝑦 : ≤ 1 and ≤ 𝑥 : + 𝑦 : − 𝑥 𝑦 ≤ 1
3
3
1.0
• Monte Carlo answer: generate a
number of random samples from
the enclosing square
• 𝑠ℎ𝑎𝑑𝑒𝑑 𝑎𝑟𝑒𝑎 = 0.5
# i7j PQ jk@lml # no 7n7@p inPQ7j
0.0
• Note that this is an RNG for
uniform distribution over the
shaded area.
• This is called the
acceptance-rejection method.
-0.5
-1.0
-1.0
-0.5
0.0
0.5
1.0
43
MONTE CARLO INTEGRATION
• Why random? Why not just use a grid?
1.0
0.5
0.0
-0.5
-1.0
-1.0
-0.5
0.0
0.5
1.0
44
MONTE CARLO INTEGRATION
• Why random? Why not just use a grid?
1.0
1. Random samples are
easier to scale and make
adaptive (e.g,. compare a 0.5
10x10 grid to a 11x11 grid).
2. With a full grid in higher
dimensions, we quickly
run out of points (≥2d)!
0.0
-0.5
3. Repeated experiments
provide statistical bounds
on the error.
-1.0
-1.0
-0.5
0.0
0.5
1.0
45
QUICK RECAP I.
Monte Carlo integration
• The world’s simplest integration algorithm:
Q
K 𝑓 𝑥 𝑑𝑥 ≈ N 𝑤P 𝑓 𝑥P
c
PRJ
where
1
𝑤P = K 1𝑑𝑥
𝑛 c
and the 𝑥P are uniform random points from X.
• If X is complicated, we can generate these random points by
generating uniform random points from a simpler set containing
X (e.g., a box), and just drop the ones outside of X.
• With this method, as we are generating 𝑥P , we also get the
estimate
∫c 1𝑑𝑥
# 𝑝𝑡𝑠 𝑖𝑛 𝑋
≈
# 𝑝𝑜𝑖𝑛𝑡𝑠
∫=nr 1𝑑𝑥
46
QUICK RECAP II.
Good, common (pseudo-)random number generators are
• Not random, but deterministic.
• The sequence is a function of the “seed” number. [rng()]
• Produce a periodic sequence of numbers (which is OK as
long as the period is really high).
• Are “apparently” random (pass simple statistical tests).
“Good” random number generation is quite difficult. (We
won’t be able to discuss the details.)
• For simple experiments, the Linear Congruential Generator
with a very long period is OK.
• But just use rand().
47
MONTE CARLO INTEGRATION
• Why random? Why not just use a grid?
1. Random samples are
easier to scale and make
adaptive (e.g,. compare a
10x10 grid to a 11x11 grid).
2. With a full grid in higher
dimensions, we quickly
run out of points (≥2d)!
3. Repeated experiments
provide statistical bounds
on the error.
48
ERROR ESTIMATION
IN MONTE CARLO
Monte Carlo is an application of the law of large numbers.
• Informally, “the sample average converges to the mean.”
Monte Carlo is also an application of the central limit theorem.
• The average of iid samples approaches a normal distribution,
regardless of the distribution of the samples. (Under mild
assumptions.)
49
ERROR ESTIMATION
IN MONTE CARLO
Monte Carlo is an application of the law of large numbers.
• Informally, “the sample average converges to the mean.”
Monte Carlo is also an application of the central limit theorem.
• The average of iid samples approaches a normal distribution.
The same is applicable to the errors:
• The sample average of the error converges to zero in a
“predictable” way:
•
𝑛
vw x⋯xvz
Q
average error
− 0 → 𝑁(0, 𝜎 : )
mean error
the variance of Ei (problem dependent!)
the “error of errors”
50
ERROR ESTIMATION
IN MONTE CARLO
The order of magnitude of the error
• Rearranging 𝑛J/:
vw x⋯xvz
Q
− 0 = 𝑐𝑜𝑛𝑠𝑡,
• The error from an n-sample estimate of the integral,
is proportional to 𝑛<J/: .
vw x⋯xvz
,
Q
• Rearranging further: achieves an error <e using a number of
samples proportional to 1/𝜀 : .
• Q1: translate this to normal words?
51
ERROR ESTIMATION
IN MONTE CARLO
The order of magnitude of the error
• Rearranging 𝑛J/:
vw x⋯xvz
Q
− 0 = 𝑐𝑜𝑛𝑠𝑡,
• The error from an n-sample estimate of the integral,
is proportional to 𝑛<J/: .
vw x⋯xvz
,
Q
• Rearranging further: achieves an error <e using a number of
samples proportional to 1/𝜀 : .
• Q1: translate this to normal words?
• Q2: is this a positive or negative result? J
52
ERROR ESTIMATION
IN MONTE CARLO
The order of magnitude of the error
• Rearranging 𝑛J/:
vw x⋯xvz
Q
− 0 = 𝑐𝑜𝑛𝑠𝑡,
• The error from an n-sample estimate of the integral,
is proportional to 𝑛<J/: .
vw x⋯xvz
,
Q
• Rearranging further: achieves an error <e using a number of
samples proportional to 1/𝜀 : .
• Bad: Won’t yield tiny errors with few samples. Needs 100x
more samples for a single extra correct digit!
• Good: the error is dimension-independent!
53
ERROR ESTIMATION
IN MONTE CARLO
The sample average only gives a single number (“point estimate”)
for the approximate integral.
• This means we don’t know how wrong or right the answer is.
A further application of the CLT and some statistics also yields
confidence intervals on the integral. (Try to do that with grids!)
• Main idea: group the samples into batches, calculate a set of
estimates, and based on their variability get a grip on the
(unknown) variance s2.
• Take a deep breath…
54
ERROR ESTIMATION
IN MONTE CARLO
A further application of the CLT and some statistics also yields
confidence intervals on the integral.
• Main idea: group the samples into batches, calculate a set of
estimates, and based on their variability get a grip on the
(unknown) variance s2.
Theorem. If 𝑋J , … , 𝑋• ~ 𝑁(𝜇, 𝜎 : ) iid normal, then sample mean 𝑋„ =
cw x⋯xc… †8
~ 𝑁(𝜇, ) is also
•
•
J
∑•
deviation is 𝑆 =
•<J PRJ
normal. Moreover, if the sample standard
𝑋P − 𝑋„ : , then
𝑋„ − 𝜇
𝑇=
~ 𝑡•<J
𝑆/ 𝑚
Student’s t-­distribution with 𝑚 − 1 degrees of freedom
[The point is, this is a concrete, exactly known distribution, unlike 𝑁 0, 𝜎 : ].
55
ERROR ESTIMATION
IN MONTE CARLO
Rearranging the final formula:
Corollary. A (1 − 𝑎)-confidence interval on µ is
𝑋„ ± 𝑡‹/:,•<J (𝑆/ 𝑚),
where 𝑡‹/:,•<J denotes the inverse CDF of the t distribution with
𝑚 − 1 degrees of freedom.
20
There is no nice formula for the t
values, but code and tables of important values are available.
Matlab: tinv(a/2,m-1)
15
10
5
0.2
0.4
https://onlinecourses.science.psu.edu/stat414/node/194
0.6
0.8
1.0
56
ERROR ESTIMATION
IN MONTE CARLO
Example. (Phew…!)
1.0
• A confidence interval for p.
• MonteCarlo_pi.m
0.5
0.0
-0.5
-1.0
-1.0
-0.5
0.0
0.5
1.0
57
QUASI-­MONTE CARLO (QMC)
Recall the pros and cons of (regular) grids:
•
•
•
It covers a rectangular domain nicely. (Really “uniform”.)
Can’t do adaptive/nested formulas easily.
Scales horribly with the dimension.
Recall the pros and cons of Monte Carlo:
•
•
•
Not really uniform, even if “uniform random”.
Can easily add any number of points.
Error is independent of the dimension.
Quasi-Monte Carlo: regular(ish) deterministic infinite sequence
whose initial finite subsequences cover the domain evenly.
58
QUASI-­MONTE CARLO (QMC)
Quasi-Monte Carlo: regular(ish) deterministic infinite sequence
whose initial finite subsequences cover the domain evenly.
• Example in 1D (source:
http://planning.cs.uiuc.edu/node196.html):
• Quite regular and uniform
• Can you see the pattern?
• Called van der Corput sequence
59
QMC: VAN DER CORPUT
60
QMC: HALTON SEQUENCE
• The van der Corput sequence only works in 1D.
• If we use a (perhaps randomly shifted) vdC sequence in
each dimension, the coordinates are too obviously
correlated. (Not uniform.)
• Halton: use a different base for vdC along each coordinate.
7/8
1/16
8/9
5/9
2/9
4/9
1/9
2/3
1/3
7/9
T
8
3/8
T
6
5/8
T
7
1/8
T
4
3/4
T
5
1/4
T
3
1/2
T
1
1
x
T
2
0
0
y
1
1/16/17, 22:21
Source: wiki
61
MC VS QMC
1
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0
0.2
0.4
0.6
0.8
1
0
0
0.2
0.4
0.6
0.8
1
Which one is which?
62
QMC: THE THEORY
The vdC and Halton sequences are examples of what is called a
low-discrepancy sequence.
•
Informally, the discrepancy of a point set measures how
uniformly the set covers the domain.
The discrepancy of an infinite sequence 𝑥J , 𝑥: , … measures
the asymptotic discrepancy of the finite subsequences
𝑥J , … , 𝑥Q as n grows.
•
•
Once again, the details are somewhat hairy. (See in a minute).
Short ref for the theory-inclined:
•
https://www.uibk.ac.at/mathematik/personal/einkemmer/qmm.pdf
63
QMC: THE THEORY
Discrepancy of a point set 𝑥J , … , 𝑥Q in 0,1 l :
# inPQ7j r‘ PQ •
Q
•∈v
𝐷 (𝑥J , … , 𝑥Q ) = sup
− vol(𝐽) ,
where E is the set of rectangular boxes in 0,1 l .
• Intuitive measure of non-uniformity, but ultimately not so useful.
Instead, we have this:
Star-discrepancy (D*): the same as above, but E is the set of
rectangular boxes in 0,1 l with 0 as a vertex.
• Example: try and convince yourself that D* of the vdC sequence
–—˜ Q
is Θ(
).
Q
64