Download 2 Simulation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Probability wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Transcript
2
2.1
Simulation
Introduction and basic examples
When a probability distribution is difficult to compute we may be able to simulate it on a
computer. This means we use the computer to generate random values that have the distribution of interest. These random values can be used to estimate many of the quantities that
we have previously computed exactly using the distribution table, including the distribution
table itself.
The starting point for simulating a random variable is almost always the uniform distribution
on (0, 1). If Z is uniformly distributed on (0, 1), then the key property of Z is that
P (a ≤ Z ≤ b) = b − a
for any 0 ≤ a ≤ b ≤ 1.
The uniform distribution is not very interesting in itself, but it can be transformed into many
other distributions. Some examples follow.
Example 2.1.1 (Continuous uniform distribution on (a,b)) If Z is uniform on (0, 1) then
(b − a)Z + a is uniform on (a, b). In Octave we have:
a = 5;
b = 8;
%% A single uniform draw on (a,b).
X = (b-a)*rand(1,1) + a;
%% A vector of 10000 uniform draws on (a,b).
X = (b-a)*rand(10000,1) + a;
Example 2.1.2 (Bernoulli trials) Given 0 ≤ p ≤ 1, a Bernoulli trial is a random variable
X such that P (X = 1) = p and P (X = 0) = 1 − p. The outcome X = 1 is called a success,
while the outcome X = 0 is called a failure. The parameter p is called the success probability.
A Bernoulli trial can be simulated by taking a uniform draw U and setting X = 1 if U < p
and X = 0 if U ≥ p. In Octave we have:
%% The success probability.
p = 0.4;
%% A Bernoulli trial with success probability p.
X = (rand(1,1) < p);
%% A vector of 1000 iid Bernoulli trials with success probability p.
X = (rand(1000,1) < p);
1
As we have seen, the probability of a specific sequence of length n having k successes and
n − k failures is pk (1 − p)n−k .
The probability of a sequence having k successes and n − k failures (with unspecified order)
is
!
n k
p (1 − p)n−k .
k
Example 2.1.3 (Sampling with replacement, order significant) Suppose we want to sample
n times uniformly from the grid {1, 2, . . . , m}. This can be accomplished by rescaling and
rounding a continuous uniform draw.
%% Generate this many independent draws.
n = 50;
%% Generate uniform draws on the set 1, ..., m.
m = 80;
%% These are the draws.
Z = ceil(m*rand(n,1));
There are mn distinct possible values for the sequence of n draws. Each of these possible
values is equally likely to occur, so the probability of the entire sequence is 1/mn . To
reiterate, each element of Z is uniform on a set of size m, Z itself is uniform on a set of size
mn .
Example 2.1.4 (Uniform distributions on an arbitrary finite set) Suppose we have a set
{v1 , . . . , vm } containing m values. We can simulate uniformly from these values as follows.
%% We want to sample uniformly from this set.
V = [1, -1, 4, 54, 91, -3.4];
%% Generate a random index.
ix = ceil(6*rand(1,1));
%% This is the realization.
Z = V(ix);
%% This is a vector of 500 iid realizations.
U = V(ceil(6*rand(500,1)));
Example 2.1.5 (Non-uniform sampling with replacement, order significant)
Suppose we wish to sample with replacement from the set {v1 , v2 , . . . , vm }, but with differing
probabilities of drawing the different objects. Specifically, we wish to draw object k with
probability pk . This is a generalization of Example 2.1.4, where pk = 1/m for every k.
2
Suppose that U is uniform on (0, 1), and let Fk = p1 +· · ·+pk be the cumulative distribution.
Then by the basic property of the uniform distribution given above,
P (Fk−1 ≤ U ≤ Fk ) = Fk − Fk−1 = pk .
Thus if we let Z = vk whenever Fk−1 ≤ U ≤ Fk (and let Z = v1 if U < F1 ), then Z will take
value vk with probability pk , and this is true for any k.
To streamline the sampling, note that Fk−1 ≤ U ≤ Fk if and only if U is greater or equal
to exactly k − 1 of the F ’s. Thus if k = sum(U >= F) + 1 then the k th point of the sample
space is drawn.
%% The sample space.
V = [.5, 1, 2, 3, 9, 16, 28, 45];
%% The probabilties.
P = [3,1,5,2,1,1,2,1] / 16;
%% The cumulative probabiltiies.
F = P;
for k=2:8
F(k) = P(k) + F(k-1);
end
%% Generate a sequence of 10 draws.
for k=1:10000
U = rand(1,1);
m = sum(U >= F) + 1;
Z(k) = V(m);
end
The probability of observing a sequence Z1 , Z2 , . . . Zn is pZ1 × pZ2 × · · · × pZn . This is not
uniform unless p1 = p2 = · · · = pm .
Example 2.1.6 (Multinomial sampling)
Suppose we are sampling with replacement, but we are not concerned with the order in
which the objects are drawn, so for example, 1, 4, 5 and 1, 5, 4 are considered to be the same
outcome. This is called a multinomial distribution.
To simulate a multinomial draw, begin by simulating Z as in Example 2.1.5. Next, since the
order is insignificant, we can simply count how many times each value occurs in Z, giving a
vector C1 , C2 , . . . , Cm , where Ck is the number of times that point vk was drawn.
3
%% C(k) will be the number of times that outcome k occurs.
C = zeros(m,1);
for k=1:length(Z)
C(Z(k)) = C(Z(k)) + 1;
end
The probabilities in multinomial sampling are not the same as the probabilities in 2.1.5. For
example, suppose that we generate a multinomial sample of size two from {1, 2, 3}, with
equal probabilities. The probability of drawing 2, 2 is 1/9 (as in 2.1.5). But the probability
of drawing 1, 3 is 2/9 (double the probability in 2.1.5).
In general, to calculate the probability of observing C1 , . . . , Cm in multinomial sampling,
first note that any specific sequence in which vk is observed Ck times has probability
C2
Cm
1
pC
1 × p2 × · · · × pm .
This is the same probability that you get in sampling with replacement. Next we need to
consider how many different ways the values can be observed in sequence. For example,
if we observe C1 = 3 and C2 = 1, there are four distinct sequences: (2, 1, 1, 1), (1, 2, 1, 1),
(1, 1, 2, 1), and (1, 1, 1, 2). Thus the probability is 4p31 p2 .
Now we need a general formula to count how many distinct ways an observed list of counts
C1 , . . . , Cm can be arranged in sequence. There are n! ways to permute the n values, but
this is overcounting, since some of these permutations are not distinguishable (i.e. in the
example above we counted 4 distinct arrangements but n! = 24). Thus we need to correct
for the fact that all points with a common value can be permuted without effect. There are
C1 ! ways to permute the v1 ’s, C2 ! ways to permute the v2 ’s, and so on. In total, there are
C1 ! × · · · × Cm ! ways to permute points with common values. Thus we must divide n! by this
product to get the number of distinct arrangements. This is called a multinomial coefficient:
!
n
n!
.
=
C1 ! × · · · Cm !
C1 , C2 , . . . , Cm
In the example above,
counting.
n
C1 ,C2
= 4!/3!1! = 4, which agrees with the value that we got by
A note about calculating factorials and multinomial coefficients: Factorials grow very quickly
and can easily overflow on a computer. This can happen even if the value of interest (e.g. a
multinomial coefficient) is of moderate magnitude, due to cancellation of large factors. Thus
it is critical to calculate multinomial coefficients on the log scale. The basic fact we use is
that
log k! = log 2 + log 3 + · · · + log k.
Thus we have the following code for calculating multinomial coefficients.
4
%% The vector of counts.
C = [3, 0, 2, 1, 5];
%% The number of draws.
m = sum(C);
%% The numerator of the multinomial coefficient.
M = sum(log([2:m]));
%% The denominator of the multinomial coefficient.
%% You may get some warnings about empty matrices here that
%% you can ignore.
for j=1:length(C)
M = M - sum(log([2:C(j)]));
end
Another approach is to use the gamma function Γ(x), which is a continuous function satisfying Γ(m + 1) = m!, for integer values of m. The function lgamma (log Γ) is built into
many computer languages, including Octave. Thus, the following code is an equivalent way
to calculate the multinomial coefficient.
%% The numerator of the multinomial coefficient.
M = lgamma(m+1);
%% The denominator of the multinomial coefficient.
%% You may get some warnings about empty matrices here that
%% you can ignore.
for j=1:length(C)
M = M - lgamma(C(j)+1);
end
Example 2.1.7 (Random permutations) A permutation is a reordering of a sequence of
values. For example, 11, 23, 19 is a permutation of 23, 19, 11, and *#! is a permutation of
#*!. For simplicity we will only talk about permutations of {1, 2, . . . , n}.
A random permutation is a permutation that arises at random from some distribution. There
are n! distinct permutations of n objects, so under a uniform distribution each has probability
1/n!. For example, if n = 3, there are 6 permutations, each with probability 1/6:
(1, 2, 3) (1, 3, 2) (2, 1, 3) (2, 3, 1) (3, 1, 2) (3, 2, 1).
A good way to simulate a uniform random permutation is to take advantage of the fact that
in a sequence of n independent, uniform random variables, there are n! distinct orderings of
the values and each ordering is equally likely to occur. Thus the index vector that sorts the
5
elements of a uniform random vector with elements in (0, 1) is a uniform permutation. If
n = 3, for example, uniform permutations arise as follows:
(0.752754,
(0.546550,
(0.368080,
(0.884041,
(0.124281,
0.090529,
0.859756,
0.633485,
0.357624,
0.540093,
0.859640)
0.862484)
0.420060)
0.889262)
0.483464)
→
→
→
→
→
(2, 1, 3)
(1, 2, 3)
(1, 3, 2)
(2, 1, 3)
(1, 3, 2)
A random permutation on n objects is therefore equivalent to the index vector that sorts a
vector of n uniform draws. This index vector can be obtained using the sort() function in
Octave.
v = [6, 1, 2, 9, 7];
%% Assigns [1, 2, 6, 7, 9] to vs and [2, 3, 1, 5, 4] to ix.
[vs, ix] = sort(v);
Using this approach, a random permutation can be obtained in Octave as follows.
%% The number of objects to be permuted.
n = 12;
%% A uniform vector of length 12.
U = rand(n,1);
%% Sort the uniform vector. The indices that give the sorted result
%% are a random permutation.
[U,ix] = sort(U);
The one-line version:
%% ix is a random permutation of size 12.
[U,ix] = sort(rand(12,1));
It is sometimes useful to permute the rows or columns of a matrix:
%% A large matrix filled with normal values.
M = randn(1000,100);
%% A column permutation of M.
[u,ic] = sort(rand(100,1));
MC = M(:,ic);
%% A row permutation of M.
[u,ir] = sort(rand(1000,1));
MC = M(ir,:);
6
Example 2.1.8 (Sampling without replacement, order significant) Suppose we have m distinguishable objects, and we sample n of them according to the following sampling procedure:
(i) draw an object, (ii) record its value, (iii) set the object aside. Note that n must be no
larger than m or you will run out of objects to draw. The difference between this situation
and Examples 2.1.5/2.1.6 is that in this case no object can appear more than once in the
sample.
There are
m(m − 1) · · · (m − n + 1)
distinct outcomes. Since all outcomes have the same probability, the probability of any given
outcome is
1/m(m − 1) · · · (m − n + 1).
A trick that makes it easy to simulate sampling without replacement is to randomly permute
the sequence 1, 2, . . . , m, then take the first n elements in the permuted sequence as the
sample.
%% Number of objects in the collection.
m = 12;
%% Size of the sample.
n = 5;
%% Generate a random permutation, as above.
[U,ix] = sort(rand(m,1));
%% The objects in the sample.
S = ix(1:n);
Example 2.1.9 (Sampling without replacement, order not significant) If the n draws in 2.1.8
are not considered to be ordered, there are n! ways to reorder a given sample, all of which
are considered to give the same outcome. Therefore there are
m(m − 1) · · · (m − n + 1)/n!
distinct outcomes, each with probability
n!/m(m − 1) · · · (m − n + 1).
For example if m = 5 and n = 3 then there are 10 outcomes:
123
124
125
134
135 234
145 235
7
245
345
Example 2.1.10 (Geometric distribution) Suppose we have a sequence of independent
Bernoulli trials X1 , X2 , . . .. Let Z denote the first index k such that Xk = 1. For example,
if the sequence of Bernoulli trials is
0, 0, 0, 1, 1, 0, 1, 0, 1, 1, . . .
then Z = 4.
This is the definition of the geometric distribution. One way to simulate from a geometric
distribution is to simulate Bernoulli trials until a success occurs (we’ll see a much better way
to do this later).
%% The success probabiltiy.
p = .03;
%% Start counting the trials at 1.
Z = 1;
%% Simulate Bernoulli trials until a success occurs.
while (1)
%% A single Bernoulli trial.
X = (rand(1,1) < p);
%% Test if a success has occured.
if (X == 1)
break;
end
%% Otherwise add one to the number of trials.
Z = Z+1;
end
Note that the only way that Z = k can occur is if X1 = · · · = Xk−1 = 0 and Xk = 1. Put
another way, we have
P (Z = k) = P (X1 = 0 and X2 = 0 and · · · and Xk−1 = 0 and Xk = 1).
Since the Bernoulli trials are independent, this is equal to
P (X1 = 0)P (X2 = 0) · · · P (Xk−1 = 0)P (Xk = 1),
which simplifies to P (Z = k) = (1 − p)k−1 p.
8
2.2
Inversion method
Suppose F (t) is the cumulative distribution function (CDF) of a random variable X:
F (t) = P (X ≤ t)
and U is uniform on (0, 1). Then
t = P (U < t)
F (t) = P (U < F (t))
F (t) = P (F −1 (U ) < t).
Recall that F −1 is defined to satisfy F −1 (F (t)) = t. If you have an expression for F (t), to
find F −1 set y = F (t) and solve for t as a function of y.
For example, if X has sample space (0, 1) and the CDF is F (t) = t3/2 , then F −1 (t) = t2/3 .
Thus if we can easily calculate F −1 for a given distribution, simulation is trivial.
Example 2.2.1 (exponential distribution) The CDF of a standard exponential distribution
is F (t) = 1 − exp(−t) on sample space (0, ∞). We can solve for the inverse:
F −1 (U ) = − log(1 − U ).
An extra simplification follows by observing that 1 − U and U have the same distribution (a
peculiar property of the uniform distribution). Thus − log(U ) is standard exponential if U
is uniform.
More generally the CDF of an exponential distribution with mean λ is F (t) = 1−exp(−t/λ).
Therefore the distribution of −λ log(U ) is exponential with mean λ.
Example 2.2.2 (logistic distribution) The logistic distribution has CDF F (x) = ex /(1 + ex )
on sample space (−∞, ∞), which has inverse F −1 (x) = log(x/(1 − x)). Thus log(U/(1 − U ))
has a logistic distribution if U is uniform.
Example 2.2.3 (Cauchy distribution) The Cauchy distribution has CDF 1/2 + atan(t)/π
on sample space (−∞, ∞). Thus tan(π(U − 1/2)) has a Cauchy distribution.
Example 2.2.4 (geometric distribution) Suppose G has a geometric distribution, so P (G =
g) = (1 − p)g−1 p for g = 1, 2, . . .. The CDF is
F (g) =
=
g
X
k=1
∞
X
(1 − p)k−1 p
p(1 − p)k−1 −
k=1
∞
X
k=g+1
g
= 1 − (1 − p) ,
9
p(1 − p)k−1
and the inverse CDF is
F −1 (t) = log(1 − t)/ log(1 − p).
Since
P (G = g) =
=
=
=
P (G ≤ g) − P (G ≤ g − 1)
F (g) − F (g − 1)
P (F (g − 1) ≤ U ≤ F (g))
P (g − 1 ≤ log(1 − U )/ log(1 − p) ≤ g),
a geometric random variable can be simulated using dlog(U )/ log(1 − p)e (as above we may
replace 1 − U with U since they have the same distribution).
10