Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture Notes #4 Based on ES 205 Lecture Notes #7 Elements of Simulation Version 3.1 Date: 2003-8-10 Major points of LN #7 Modeling aspect of simulation: GSMP vs. Simulation Computer science aspects of simulation: language, interface, programming Statistical aspects of simulation: 1. generation of random numbers and variables, discrete r.v. and alias method 2. output analysis - CLT, and confidence interval, and estimation 3. fundamental limitations advanced topics: order statistics, regeneration, transient, warm-up period, correlations ES 205 LECTURE NOTES # 7 ELEMENTS OF SIMULATION (CH.10 OF CL99,) • Simulation is the electronic equivalent of a “pilot plant or laboratory mockup”. We are literally employing the trial-and-error method plus some statistical sophistication. Thus there are two aspects (i) the laboratory aspects - software which includes general purpose algorithms and interfaces, e.g., the GSMP model and the GUI object oriented features (ii) the statistical aspects - analysis of output data as a statistical experiment • the event scheduling approach: the diagram (Fig.10.1. p595 of CL99) Copyright by Yu-Chi Ho Note time steps forward from event to event in this approach in contradistinction to the integration of differential equations in CVDS where time marches on in small increments of t. Ingredients needed: registers for state, time, scheduled (future) event list routines for initialization, state transition, update time, statistics gathering, output report, and random variable generation main program which models the DEDS (user written) modern features, e.g., animation, object-oriented programming, etc. The example of EXTEND software and the G/G/1 queue demo. 2. RANDOM NUMBERS AND VARIABLES GENERATION; THE LINEAR CONGRUENTIAL METHOD: xn+1 =modM[axn+b], un+1=xn+1/M, “mod m” part gives the remainder of (axn+b)/m, where un contains uniform distribution on [0,1]. Based on the random variable containing uniform distribution, using the following Inverse Transform Method, we can generate the random numbers containing arbitrary distribution. Example 1 Let a = 2, b = 1, and M = 16. Using Eq.(1) and various x0's, we get xo = 1 2 4 6 8 10 12 13 14 x1 = 3 5 9 13 1 5 9 11 13 x2 = 7 11 3 11 3 x3 = 15 7 7 7 7 x4 = 15 15 15 15 15 x5 = 15 15 15 15 15 All sequences gets stuck x6 = • • • • • after the initial transients ! Example 2 Let a = 3, b = 0, and M = 16. Similarly, we have Note that, depending on the initial seeds, the sequences get into cycles with different periods. But none of the sequences produce the maximal period of 0-15. Example 3 Let a = 1, b = 3, and M = 16. This time starting with any seed, we get the maximal period and the sequence [ . . . , 1, 4, 7, 10, 13, 0, 3, 6, 9, 12, 15, 2, 5, 8, 11, 14, 1, . . . . ]. This is nice. However, a plot of the sequence vs. time shows high correlation among successive numbers in the sequence as illustrated in Fig. 2. Thus the numbers in the sequence are not at all independent. Copyright by Yu-Chi Ho Fig. 2 Plot of Pseudo Random Sequence Example 4 Let a= 5, b = 3, and M = 16. Once again we get a sequence of maximal period with any seed, [ . . . . , 1, 8, 11, 10, 5, 12, 15, 14, 9, 0, 3, 2, 13, 4, 7, 6, 1, . . . ]. A similar plot as in Fig.2 shows a reasonably random looking sequence. Thus, periodicity and correlation are important. For the Last word see S. Terzuka's book on random number generation (Kluwer 1997). METHOD OF INVERSE TRANSFORM The reference of this part, the Method of Inverse Transform and the Method of Rejection, is Chapter 10.6 CLB. Copyright by Yu-Chi Ho Fig. 4 Inverse Transform Method for Generating F(x)-Distributed Random Variables. To see this, consider the probability P(x a) = P( F -1(u) a ) = P( u F (a)) = F (a), where the last equality is by virtue of the uniform distribution of the random variable u. Ex. Exponential distribution u=F(x) = 1- exp(-x) ==> ln(u-1) = x or x = (1/)ln(u-1) METHOD OF REJECTION 3. Sampling, the Central Limit Theorem, and Confidence Intervals Estimating the mean of a random variable, L, i.e., J=E[L( )]. Consider [est1] = any sample of L => E[est1]=J ==> unbiasedness Consider 1 N est2 L , i N i 1 with lim est2 J N ==> consistency due to law of large numbers However, Var[est1] = 2 and Var[est2]= (1/N2)N2 = 2/N -->0 with N-->infinity. Note 1/(N)1/2 is a slowly decreasing function. Moral: need many replication of L or a long simulation Consider independently and identically distributed (i.i.d.) random variables x1, x2, . . . , xn, with E(xi) = and Var(xi) = 2. Define Mn = [(x1+x2+ . . . +xn ) - n]/(n2)1/2. As n∞, the distribution of Mn converges to N(0,1), i.e., the normal (Gaussian) distribution with mean zero and unit variance. This is known as the Central Limit Theorem (CLT). The significance of CLT for experimental work lies in the fact that it enables us to predict the error of sampling. For example, suppose we take n samples of a random variable with mean . We may use x (x1+x2+ . . . +xn )/n as an estimate for the unknown mean . Then Mn is the normalized error of the estimate. For large n, we can use standard tables for Gaussian random variables to calculate P(-t<Mn<t) which is the probability that the error of the estimate for lies in [-t, t]. For example, if t=1.96, we get P=0.95; i.e., we are 95% confident that the interval [ x -t (2/n)1/2, x +t (2/n)1/2] contains the unknown mean . For a specified confidence and interval size, we can calculate how many trials of the experiment are needed. Copyright by Yu-Chi Ho with probability 0.95 the interval x 1 n 2 xi 1.96 contains the unknown mean n i 1 n n xi n 2 n 1 i 1 ~ N 0,1 . n xi ~ N 0, n or y 2 i 1 n The above confidence interval formula suffers from two drawbacks. First, It requires the knowledge of the variance of the random variable, 2. It hardly seems reasonable that we can know the value of when not even the mean is known. The common practice is to replace by the sample variance, 1 n 1 n 2 2 s2 xi x or s2 xi x n i 1 n 1 i 1 in which case the formula is only approximate. However, if we do know that the random variable in question is Gaussian, an exact formula for the confidence interval can be stated in terms of the Student-t distribution using the sample mean, x, and sample variance, s2 [Bratley, Fox, and Schrage 1987]. There is a multivariate version of the Central Limit Theorem which replaces and by their multidimensional version of and and the denominator of Mn by (|)n/2. The second drawback is more serious. Because of the 1/ (n)1/2 factor, for every one order of magnitude decrease in standard deviation (confidence interval) we need two orders of magnitude increase in sampling cost. Often this is not tolerable. 4. Nonparametric Analysis & Order statistics Suppose you take n i.i.d. samples of an arbitrary random variable. Now you order the sample by magnitude into x[1]<x[2]< x[3]< . . . <x[n] . Then the theory of order statistic says that these order statistic on the average divide the population into n+1 parts with equal # of random variables. Furthermore, we can calculated the probability that ?% of the population is contained below, above or between any one or two order statistics. This is something one gets for free in any statistical experiment including simulation. 5. Additional problems of simulating DEDS: each run constitutes only one sample of the random variable you are trying to measure! In some cases, we should judge whether the system has entered the stable situation. Otherwise, we do not measure because this describes a transient or stable situation. Another question is the correlation among the random variables. Since we generate the random variable Copyright by Yu-Chi Ho through Linear Congruential Method, the sequence of numbers are actually deterministic, not stochastic. Given the parameters in Linear Congruential Method, the sequence of numbers has a period. When the length of the numbers go beyond this period, the “random” number generated begins to repeat. Thus the correlation of the random numbers should be considered and overcome when the simulation time is very long. (The frequently of the problem caused by repeat can be estimated by the # of random variables used by the simulation in per time unit, and the period of the linear congruential method. Generally the more numbers used in per time unit simulation, the shorter the random number generator is, the more frequent of the problem caused by this repeat phenomenon.) 6. Advance simulation topics: Antithetic random variables, Regenrative cycles shortens simulation runs. Common random variables. Use of warmup period. Separate batches. 7. The Alias Method of Choosing Event Types (skim only) [Bratley et al. 1987] in this part refers to Paul Bratley, Bennett L. Fox, Linus E. Schrage, A guide to simulation, New York: Springer-Verlag, c1987, 2nd edition. Another standard problem encountered in simulation is the generation of discrete random variables according to arbitrary distributions. This can become time consuming when the domain of the random variable is large. One efficient way of obtaining a random variable distributed over the integers 1,2,...,n with probabilities p(i), i=1,2,...,n, is the alias method [Bratley et al. 1987]. This method can be used to further reduce the computation effort, e.g., in the standard clock simulation approach in determining the event type at every transition instant (to be covered in later Lecture Notes). The method requires only one uniformly distributed variable, one comparison, and at most two memory references per sample. It is thus independent of the size of the possible event list, an important advantage in the simulation of large systems via the standard clock approach. However, this method requires pre-computing two tables of length n, which is a one-time effort. The alias method uses two tables, R(i) and A(i), i=1,2,...,n , 0R(i)and A(i) is a mapping from the set {1,2,...,n} to itself. The description of the algorithm generating the random variable with distribution p(i) below essentially follows [Bratley et al. 1987]. (1) Generate a uniformly distributed random variable u [0, 1). (2) Let v=nu (v is uniform on [0, n)). (3) Set I = [v]; I is the smallest integer which is bigger than v. (I is uniform on integers 1,2,..,n.) (4) Set w=I-v (note that w is uniform on [0, 1) and independent of I). (5) If w R(I), then output e = I; otherwise, output e= A(I). Get I=1,…,n with uniform probability from I=[nu], Copyright by Yu-Chi Ho u U (0,1] Get w U (0,1] via w=I-v Is w R I ? No Accept A(I) (get I with probability Yes 1 R j ) n j: A j i Accept I with probability R(I)/n<1/n In the algorithm, we first generate a uniformly distributed integer I on 1,2,...,n; then we adjust the probabilities by replacing the number I {1,2,...,n} by its "alias" A(I), with a certain "aliasing" probability 1-R(I). If we choose the aliases and the aliasing probabilities properly, then the random variable generated by this algorithm, e, has the desired distribution. From the algorithm, we have R i , n which says that the probability of getting “i” without aliasing is smaller than 1/n. On the other hand, P w R I , I i 1 R j . n Summing the probabilities of the mutually exclusive ways to get e=i, we obtain Pw RI , I j P e i R i 1 R j , n n j: A j i which provides a means of increasing the probability of getting “i” to above 1/n. Thus, if we choose A(i) and R(i), i=1,2,...,n, such that the above quantity equals p(i), then the random variable e has distribution p(i), i=1,2,...,n. it is worthwhile to note that the above relation does not uniquely specify the values of A(i) and R(i). (two sets of numbers to satisfy one set of equations) There may be many tables which can be used as the aliases and aliasing probabilities. The following is an algorithm generating a proper set of A(i) and R(i), i=1,2,...,n. (1) Set H = , and L = . ( is the null set.) (2) For i=1 to n: (a) set R(i) = np(i); (b) if R(i) > 1, then add i to the set H; (c) if R(i) < 1, then add i to the set L. (3) (a) if H = , stop; Copyright by Yu-Chi Ho (b) otherwise select an index j from L and an index k from H. (4) (a) Set A(j) = k; (A(j) and R(j) are now finalized.) (b) Set R(k) = R(k) + R(j) -1; (c) if R(k) 1, remove k from H; (question: what if R(k)>1?) (d) if R(k) < 1, add k to L; (e) remove j from L. (5) Go to step 3. This algorithm runs in O(n) time because at each iteration the number of indices in the union of H and L goes down by at least one. Also, the total “excess” probability in H always equals the total “shortage” in L; i.e., R i 1 1 R i . iH iL This shows that in step 2(a) if H = , then L = . At the last iteration, there is just one element in each of H and L. Hence R(j) -1 = 1- R(k) at step 3(b) and at step 4(b) R(k) = 1; this leaves both H and L empty. The proof of the algorithm is left as an exercise below. (Note R(I)=1 means there are no aliases for this I) Exercise: (P.242) Prove that in the algorithm generating A(i) and R(i), we have Copyright by Yu-Chi Ho (a). If np(i) < 1, then {j: A(j)=i}= (b). If np(i) > 1, then R(i)+ 1 R j =np(i). j: A j i Answer: (a). Proof: Going over the algorithm on p.240, we find the only chance that an index i can be added to H is at step (2)-(b). So, if R(i)=np(i) < 1, then i will never be added to H. Hence, if np(i) < 1, then {j: A(j)=i}= (b). Proof: Let's study the algorithm. From step (2), we know if R(i)=np(i) > 1, then iH. From step (4)-(a) and (4)-(b), we know that the value R(i) decreases to R(i)+R(j)-1=np(i)+R(j)-1 By (4)-(c), we know this kind of j will leave L and this procedure repeats until R(i) ≤ 1 So we have R(i)=np(i)+ R j 1 j: A j i Hence R(i)+ j: A j i 1 R j =np(i) (optional) Another alternative to Alias method is the METROPOLIS ALGORITHM for sampling X-finite set,(x) - distribution. We want to pick x from when it is known only proportionally. Consider Markov Chain K(x,y) and ratio A(x,y) (y)K(y,x)/ (x)K(x,y). At x we pick y from K(x,y); If A(x,y)≥1 go to y, else flip coin according to A(x,y) then go to y if head else stay at x. Basis of Simulated Annealing. Copyright by Yu-Chi Ho