Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
2 2.1 Simulation Introduction and basic examples When a probability distribution is difficult to compute we may be able to simulate it on a computer. This means we use the computer to generate random values that have the distribution of interest. These random values can be used to estimate many of the quantities that we have previously computed exactly using the distribution table, including the distribution table itself. The starting point for simulating a random variable is almost always the uniform distribution on (0, 1). If Z is uniformly distributed on (0, 1), then the key property of Z is that P (a ≤ Z ≤ b) = b − a for any 0 ≤ a ≤ b ≤ 1. The uniform distribution is not very interesting in itself, but it can be transformed into many other distributions. Some examples follow. Example 2.1.1 (Continuous uniform distribution on (a,b)) If Z is uniform on (0, 1) then (b − a)Z + a is uniform on (a, b). In Octave we have: a = 5; b = 8; %% A single uniform draw on (a,b). X = (b-a)*rand(1,1) + a; %% A vector of 10000 uniform draws on (a,b). X = (b-a)*rand(10000,1) + a; Example 2.1.2 (Bernoulli trials) Given 0 ≤ p ≤ 1, a Bernoulli trial is a random variable X such that P (X = 1) = p and P (X = 0) = 1 − p. The outcome X = 1 is called a success, while the outcome X = 0 is called a failure. The parameter p is called the success probability. A Bernoulli trial can be simulated by taking a uniform draw U and setting X = 1 if U < p and X = 0 if U ≥ p. In Octave we have: %% The success probability. p = 0.4; %% A Bernoulli trial with success probability p. X = (rand(1,1) < p); %% A vector of 1000 iid Bernoulli trials with success probability p. X = (rand(1000,1) < p); 1 As we have seen, the probability of a specific sequence of length n having k successes and n − k failures is pk (1 − p)n−k . The probability of a sequence having k successes and n − k failures (with unspecified order) is ! n k p (1 − p)n−k . k Example 2.1.3 (Sampling with replacement, order significant) Suppose we want to sample n times uniformly from the grid {1, 2, . . . , m}. This can be accomplished by rescaling and rounding a continuous uniform draw. %% Generate this many independent draws. n = 50; %% Generate uniform draws on the set 1, ..., m. m = 80; %% These are the draws. Z = ceil(m*rand(n,1)); There are mn distinct possible values for the sequence of n draws. Each of these possible values is equally likely to occur, so the probability of the entire sequence is 1/mn . To reiterate, each element of Z is uniform on a set of size m, Z itself is uniform on a set of size mn . Example 2.1.4 (Uniform distributions on an arbitrary finite set) Suppose we have a set {v1 , . . . , vm } containing m values. We can simulate uniformly from these values as follows. %% We want to sample uniformly from this set. V = [1, -1, 4, 54, 91, -3.4]; %% Generate a random index. ix = ceil(6*rand(1,1)); %% This is the realization. Z = V(ix); %% This is a vector of 500 iid realizations. U = V(ceil(6*rand(500,1))); Example 2.1.5 (Non-uniform sampling with replacement, order significant) Suppose we wish to sample with replacement from the set {v1 , v2 , . . . , vm }, but with differing probabilities of drawing the different objects. Specifically, we wish to draw object k with probability pk . This is a generalization of Example 2.1.4, where pk = 1/m for every k. 2 Suppose that U is uniform on (0, 1), and let Fk = p1 +· · ·+pk be the cumulative distribution. Then by the basic property of the uniform distribution given above, P (Fk−1 ≤ U ≤ Fk ) = Fk − Fk−1 = pk . Thus if we let Z = vk whenever Fk−1 ≤ U ≤ Fk (and let Z = v1 if U < F1 ), then Z will take value vk with probability pk , and this is true for any k. To streamline the sampling, note that Fk−1 ≤ U ≤ Fk if and only if U is greater or equal to exactly k − 1 of the F ’s. Thus if k = sum(U >= F) + 1 then the k th point of the sample space is drawn. %% The sample space. V = [.5, 1, 2, 3, 9, 16, 28, 45]; %% The probabilties. P = [3,1,5,2,1,1,2,1] / 16; %% The cumulative probabiltiies. F = P; for k=2:8 F(k) = P(k) + F(k-1); end %% Generate a sequence of 10 draws. for k=1:10000 U = rand(1,1); m = sum(U >= F) + 1; Z(k) = V(m); end The probability of observing a sequence Z1 , Z2 , . . . Zn is pZ1 × pZ2 × · · · × pZn . This is not uniform unless p1 = p2 = · · · = pm . Example 2.1.6 (Multinomial sampling) Suppose we are sampling with replacement, but we are not concerned with the order in which the objects are drawn, so for example, 1, 4, 5 and 1, 5, 4 are considered to be the same outcome. This is called a multinomial distribution. To simulate a multinomial draw, begin by simulating Z as in Example 2.1.5. Next, since the order is insignificant, we can simply count how many times each value occurs in Z, giving a vector C1 , C2 , . . . , Cm , where Ck is the number of times that point vk was drawn. 3 %% C(k) will be the number of times that outcome k occurs. C = zeros(m,1); for k=1:length(Z) C(Z(k)) = C(Z(k)) + 1; end The probabilities in multinomial sampling are not the same as the probabilities in 2.1.5. For example, suppose that we generate a multinomial sample of size two from {1, 2, 3}, with equal probabilities. The probability of drawing 2, 2 is 1/9 (as in 2.1.5). But the probability of drawing 1, 3 is 2/9 (double the probability in 2.1.5). In general, to calculate the probability of observing C1 , . . . , Cm in multinomial sampling, first note that any specific sequence in which vk is observed Ck times has probability C2 Cm 1 pC 1 × p2 × · · · × pm . This is the same probability that you get in sampling with replacement. Next we need to consider how many different ways the values can be observed in sequence. For example, if we observe C1 = 3 and C2 = 1, there are four distinct sequences: (2, 1, 1, 1), (1, 2, 1, 1), (1, 1, 2, 1), and (1, 1, 1, 2). Thus the probability is 4p31 p2 . Now we need a general formula to count how many distinct ways an observed list of counts C1 , . . . , Cm can be arranged in sequence. There are n! ways to permute the n values, but this is overcounting, since some of these permutations are not distinguishable (i.e. in the example above we counted 4 distinct arrangements but n! = 24). Thus we need to correct for the fact that all points with a common value can be permuted without effect. There are C1 ! ways to permute the v1 ’s, C2 ! ways to permute the v2 ’s, and so on. In total, there are C1 ! × · · · × Cm ! ways to permute points with common values. Thus we must divide n! by this product to get the number of distinct arrangements. This is called a multinomial coefficient: ! n n! . = C1 ! × · · · Cm ! C1 , C2 , . . . , Cm In the example above, counting. n C1 ,C2 = 4!/3!1! = 4, which agrees with the value that we got by A note about calculating factorials and multinomial coefficients: Factorials grow very quickly and can easily overflow on a computer. This can happen even if the value of interest (e.g. a multinomial coefficient) is of moderate magnitude, due to cancellation of large factors. Thus it is critical to calculate multinomial coefficients on the log scale. The basic fact we use is that log k! = log 2 + log 3 + · · · + log k. Thus we have the following code for calculating multinomial coefficients. 4 %% The vector of counts. C = [3, 0, 2, 1, 5]; %% The number of draws. m = sum(C); %% The numerator of the multinomial coefficient. M = sum(log([2:m])); %% The denominator of the multinomial coefficient. %% You may get some warnings about empty matrices here that %% you can ignore. for j=1:length(C) M = M - sum(log([2:C(j)])); end Another approach is to use the gamma function Γ(x), which is a continuous function satisfying Γ(m + 1) = m!, for integer values of m. The function lgamma (log Γ) is built into many computer languages, including Octave. Thus, the following code is an equivalent way to calculate the multinomial coefficient. %% The numerator of the multinomial coefficient. M = lgamma(m+1); %% The denominator of the multinomial coefficient. %% You may get some warnings about empty matrices here that %% you can ignore. for j=1:length(C) M = M - lgamma(C(j)+1); end Example 2.1.7 (Random permutations) A permutation is a reordering of a sequence of values. For example, 11, 23, 19 is a permutation of 23, 19, 11, and *#! is a permutation of #*!. For simplicity we will only talk about permutations of {1, 2, . . . , n}. A random permutation is a permutation that arises at random from some distribution. There are n! distinct permutations of n objects, so under a uniform distribution each has probability 1/n!. For example, if n = 3, there are 6 permutations, each with probability 1/6: (1, 2, 3) (1, 3, 2) (2, 1, 3) (2, 3, 1) (3, 1, 2) (3, 2, 1). A good way to simulate a uniform random permutation is to take advantage of the fact that in a sequence of n independent, uniform random variables, there are n! distinct orderings of the values and each ordering is equally likely to occur. Thus the index vector that sorts the 5 elements of a uniform random vector with elements in (0, 1) is a uniform permutation. If n = 3, for example, uniform permutations arise as follows: (0.752754, (0.546550, (0.368080, (0.884041, (0.124281, 0.090529, 0.859756, 0.633485, 0.357624, 0.540093, 0.859640) 0.862484) 0.420060) 0.889262) 0.483464) → → → → → (2, 1, 3) (1, 2, 3) (1, 3, 2) (2, 1, 3) (1, 3, 2) A random permutation on n objects is therefore equivalent to the index vector that sorts a vector of n uniform draws. This index vector can be obtained using the sort() function in Octave. v = [6, 1, 2, 9, 7]; %% Assigns [1, 2, 6, 7, 9] to vs and [2, 3, 1, 5, 4] to ix. [vs, ix] = sort(v); Using this approach, a random permutation can be obtained in Octave as follows. %% The number of objects to be permuted. n = 12; %% A uniform vector of length 12. U = rand(n,1); %% Sort the uniform vector. The indices that give the sorted result %% are a random permutation. [U,ix] = sort(U); The one-line version: %% ix is a random permutation of size 12. [U,ix] = sort(rand(12,1)); It is sometimes useful to permute the rows or columns of a matrix: %% A large matrix filled with normal values. M = randn(1000,100); %% A column permutation of M. [u,ic] = sort(rand(100,1)); MC = M(:,ic); %% A row permutation of M. [u,ir] = sort(rand(1000,1)); MC = M(ir,:); 6 Example 2.1.8 (Sampling without replacement, order significant) Suppose we have m distinguishable objects, and we sample n of them according to the following sampling procedure: (i) draw an object, (ii) record its value, (iii) set the object aside. Note that n must be no larger than m or you will run out of objects to draw. The difference between this situation and Examples 2.1.5/2.1.6 is that in this case no object can appear more than once in the sample. There are m(m − 1) · · · (m − n + 1) distinct outcomes. Since all outcomes have the same probability, the probability of any given outcome is 1/m(m − 1) · · · (m − n + 1). A trick that makes it easy to simulate sampling without replacement is to randomly permute the sequence 1, 2, . . . , m, then take the first n elements in the permuted sequence as the sample. %% Number of objects in the collection. m = 12; %% Size of the sample. n = 5; %% Generate a random permutation, as above. [U,ix] = sort(rand(m,1)); %% The objects in the sample. S = ix(1:n); Example 2.1.9 (Sampling without replacement, order not significant) If the n draws in 2.1.8 are not considered to be ordered, there are n! ways to reorder a given sample, all of which are considered to give the same outcome. Therefore there are m(m − 1) · · · (m − n + 1)/n! distinct outcomes, each with probability n!/m(m − 1) · · · (m − n + 1). For example if m = 5 and n = 3 then there are 10 outcomes: 123 124 125 134 135 234 145 235 7 245 345 Example 2.1.10 (Geometric distribution) Suppose we have a sequence of independent Bernoulli trials X1 , X2 , . . .. Let Z denote the first index k such that Xk = 1. For example, if the sequence of Bernoulli trials is 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, . . . then Z = 4. This is the definition of the geometric distribution. One way to simulate from a geometric distribution is to simulate Bernoulli trials until a success occurs (we’ll see a much better way to do this later). %% The success probabiltiy. p = .03; %% Start counting the trials at 1. Z = 1; %% Simulate Bernoulli trials until a success occurs. while (1) %% A single Bernoulli trial. X = (rand(1,1) < p); %% Test if a success has occured. if (X == 1) break; end %% Otherwise add one to the number of trials. Z = Z+1; end Note that the only way that Z = k can occur is if X1 = · · · = Xk−1 = 0 and Xk = 1. Put another way, we have P (Z = k) = P (X1 = 0 and X2 = 0 and · · · and Xk−1 = 0 and Xk = 1). Since the Bernoulli trials are independent, this is equal to P (X1 = 0)P (X2 = 0) · · · P (Xk−1 = 0)P (Xk = 1), which simplifies to P (Z = k) = (1 − p)k−1 p. 8 2.2 Inversion method Suppose F (t) is the cumulative distribution function (CDF) of a random variable X: F (t) = P (X ≤ t) and U is uniform on (0, 1). Then t = P (U < t) F (t) = P (U < F (t)) F (t) = P (F −1 (U ) < t). Recall that F −1 is defined to satisfy F −1 (F (t)) = t. If you have an expression for F (t), to find F −1 set y = F (t) and solve for t as a function of y. For example, if X has sample space (0, 1) and the CDF is F (t) = t3/2 , then F −1 (t) = t2/3 . Thus if we can easily calculate F −1 for a given distribution, simulation is trivial. Example 2.2.1 (exponential distribution) The CDF of a standard exponential distribution is F (t) = 1 − exp(−t) on sample space (0, ∞). We can solve for the inverse: F −1 (U ) = − log(1 − U ). An extra simplification follows by observing that 1 − U and U have the same distribution (a peculiar property of the uniform distribution). Thus − log(U ) is standard exponential if U is uniform. More generally the CDF of an exponential distribution with mean λ is F (t) = 1−exp(−t/λ). Therefore the distribution of −λ log(U ) is exponential with mean λ. Example 2.2.2 (logistic distribution) The logistic distribution has CDF F (x) = ex /(1 + ex ) on sample space (−∞, ∞), which has inverse F −1 (x) = log(x/(1 − x)). Thus log(U/(1 − U )) has a logistic distribution if U is uniform. Example 2.2.3 (Cauchy distribution) The Cauchy distribution has CDF 1/2 + atan(t)/π on sample space (−∞, ∞). Thus tan(π(U − 1/2)) has a Cauchy distribution. Example 2.2.4 (geometric distribution) Suppose G has a geometric distribution, so P (G = g) = (1 − p)g−1 p for g = 1, 2, . . .. The CDF is F (g) = = g X k=1 ∞ X (1 − p)k−1 p p(1 − p)k−1 − k=1 ∞ X k=g+1 g = 1 − (1 − p) , 9 p(1 − p)k−1 and the inverse CDF is F −1 (t) = log(1 − t)/ log(1 − p). Since P (G = g) = = = = P (G ≤ g) − P (G ≤ g − 1) F (g) − F (g − 1) P (F (g − 1) ≤ U ≤ F (g)) P (g − 1 ≤ log(1 − U )/ log(1 − p) ≤ g), a geometric random variable can be simulated using dlog(U )/ log(1 − p)e (as above we may replace 1 − U with U since they have the same distribution). 10