Download Consider

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Randomness wikipedia , lookup

Transcript
Lecture Notes #4
Based on ES 205 Lecture Notes #7 Elements of Simulation
Version 3.1
Date: 2003-8-10
Major points of LN #7
 Modeling aspect of simulation: GSMP vs. Simulation
 Computer science aspects of simulation: language, interface, programming
 Statistical aspects of simulation:
1. generation of random numbers and variables, discrete r.v. and alias method
2. output analysis - CLT, and confidence interval, and estimation
3. fundamental limitations

advanced topics: order statistics, regeneration, transient, warm-up period, correlations
ES 205 LECTURE NOTES # 7 ELEMENTS OF SIMULATION (CH.10 OF CL99,)
•
Simulation is the electronic equivalent of a “pilot plant or laboratory mockup”. We are
literally employing the trial-and-error method plus some statistical sophistication. Thus there
are two aspects
(i)
the laboratory aspects - software which includes general purpose algorithms and
interfaces, e.g., the GSMP model and the GUI object oriented features
(ii)
the statistical aspects - analysis of output data as a statistical experiment
•
the event scheduling approach: the diagram (Fig.10.1. p595 of CL99)
Copyright by Yu-Chi Ho
Note time steps forward from event to event in this approach in contradistinction to the
integration of differential equations in CVDS where time marches on in small
increments of t.
Ingredients needed:
registers for state, time, scheduled (future) event list
routines for initialization, state transition, update time, statistics gathering,
output report, and random variable generation
main program which models the DEDS (user written)
modern features, e.g., animation, object-oriented programming, etc.
The example of EXTEND software and the G/G/1 queue demo.
2.
RANDOM NUMBERS AND VARIABLES GENERATION; THE LINEAR CONGRUENTIAL
METHOD: xn+1 =modM[axn+b], un+1=xn+1/M, “mod m” part gives the remainder of
(axn+b)/m, where un contains uniform distribution on [0,1]. Based on the random variable
containing uniform distribution, using the following Inverse Transform Method, we can
generate the random numbers containing arbitrary distribution.
Example 1 Let a = 2, b = 1, and M = 16. Using Eq.(1) and various x0's, we get
xo = 1
2
4
6
8
10
12
13
14
x1 = 3
5
9
13
1
5
9
11
13
x2 = 7
11
3
11
3
x3 = 15
7
7
7
7
x4 = 15
15
15
15
15
x5 = 15
15
15
15
15
All sequences gets stuck
x6 = •
•
•
•
•
after the initial transients !
Example 2 Let a = 3, b = 0, and M = 16. Similarly, we have
Note that, depending on the initial seeds, the sequences get into cycles with different periods.
But none of the sequences produce the maximal period of 0-15.
Example 3 Let a = 1, b = 3, and M = 16. This time starting with any seed, we get the
maximal period and the sequence [ . . . , 1, 4, 7, 10, 13, 0, 3, 6, 9, 12, 15, 2, 5, 8, 11, 14,
1, . . . . ]. This is nice. However, a plot of the sequence vs. time shows high correlation
among successive numbers in the sequence as illustrated in Fig. 2. Thus the numbers in the
sequence are not at all independent.
Copyright by Yu-Chi Ho
Fig. 2 Plot of Pseudo Random Sequence
Example 4 Let a= 5, b = 3, and M = 16. Once again we get a sequence of maximal period
with any seed, [ . . . . , 1, 8, 11, 10, 5, 12, 15, 14, 9, 0, 3, 2, 13, 4, 7, 6, 1, . . . ]. A similar plot
as in Fig.2 shows a reasonably random looking sequence.
Thus, periodicity and correlation are important. For the Last word see S. Terzuka's book on
random number generation (Kluwer 1997).
METHOD OF INVERSE TRANSFORM
The reference of this part, the Method of Inverse Transform and the Method of Rejection, is
Chapter 10.6 CLB.
Copyright by Yu-Chi Ho
Fig. 4 Inverse Transform Method for Generating F(x)-Distributed Random Variables.
To see this, consider the probability
P(x  a) = P( F -1(u)  a ) = P( u  F (a)) = F (a),
where the last equality is by virtue of the uniform distribution of the random variable u.
Ex. Exponential distribution u=F(x) = 1- exp(-x) ==> ln(u-1) = x or x = (1/)ln(u-1)
METHOD OF REJECTION
3. Sampling, the Central Limit Theorem, and Confidence Intervals
Estimating the mean of a random variable, L, i.e., J=E[L( )]. Consider
[est1] = any sample of L => E[est1]=J ==> unbiasedness
Consider
1 N
est2   L  , i 
N i 1
with
lim est2  J
N 
==> consistency due to law of large numbers
However, Var[est1] = 2 and Var[est2]= (1/N2)N2 = 2/N -->0 with N-->infinity. Note
1/(N)1/2 is a slowly decreasing function.
Moral: need many replication of L or a long simulation
Consider independently and identically distributed (i.i.d.) random variables x1, x2, . . . , xn,
with E(xi) =  and Var(xi) = 2. Define Mn = [(x1+x2+ . . . +xn ) - n]/(n2)1/2. As n∞,
the distribution of Mn converges to N(0,1), i.e., the normal (Gaussian) distribution with mean
zero and unit variance. This is known as the Central Limit Theorem (CLT). The
significance of CLT for experimental work lies in the fact that it enables us to predict the
error of sampling. For example, suppose we take n samples of a random variable with mean
. We may use x (x1+x2+ . . . +xn )/n as an estimate for the unknown mean . Then Mn is
the normalized error of the estimate. For large n, we can use standard tables for Gaussian
random variables to calculate P(-t<Mn<t) which is the probability that the error of the
estimate for lies in [-t, t]. For example, if t=1.96, we get P=0.95; i.e., we are 95%
confident that the interval [ x -t (2/n)1/2, x +t (2/n)1/2] contains the unknown mean . For a
specified confidence and interval size, we can calculate how many trials of the experiment are
needed.
Copyright by Yu-Chi Ho
with probability 0.95 the interval x 
1 n
2
xi  1.96
contains the unknown mean

n i 1
n
 n

xi  n 


2
n
  
1

 i 1
 ~ N 0,1 .
 
 n  xi    ~ N  0, n  or y 
2
 i 1

n


The above confidence interval formula suffers from two drawbacks. First, It requires
the knowledge of the variance of the random variable, 2. It hardly seems reasonable that we
can know the value of  when not even the mean  is known. The common practice is to
replace  by the sample variance,
1 n
1 n
2
2
 s2    xi  x  or  s2 
 xi  x 

n i 1
n  1 i 1
in which case the formula is only approximate. However, if we do know that the random
variable in question is Gaussian, an exact formula for the confidence interval can be stated in
terms of the Student-t distribution using the sample mean, x, and sample variance, s2
[Bratley, Fox, and Schrage 1987].
There is a multivariate version of the Central Limit Theorem which replaces  and
by their multidimensional version of  and and the denominator of Mn by (|)n/2.
The second drawback is more serious. Because of the 1/ (n)1/2 factor, for every one
order of magnitude decrease in standard deviation (confidence interval) we need two
orders of magnitude increase in sampling cost. Often this is not tolerable.
4. Nonparametric Analysis & Order statistics
Suppose you take n i.i.d. samples of an arbitrary random variable. Now you order the sample
by magnitude into x[1]<x[2]< x[3]< . . . <x[n] . Then the theory of order statistic says that these
order statistic on the average divide the population into n+1 parts with equal # of random
variables. Furthermore, we can calculated the probability that ?% of the population is
contained below, above or between any one or two order statistics. This is something one
gets for free in any statistical experiment including simulation.
5.
Additional problems of simulating DEDS: each run constitutes only one
sample of the random variable you are trying to measure! In some cases, we should
judge whether the system has entered the stable situation. Otherwise, we do not
measure because this describes a transient or stable situation. Another question is the
correlation among the random variables. Since we generate the random variable
Copyright by Yu-Chi Ho
through Linear Congruential Method, the sequence of numbers are actually
deterministic, not stochastic. Given the parameters in Linear Congruential Method,
the sequence of numbers has a period. When the length of the numbers go beyond this
period, the “random” number generated begins to repeat. Thus the correlation of the
random numbers should be considered and overcome when the simulation time is very
long. (The frequently of the problem caused by repeat can be estimated by the # of
random variables used by the simulation in per time unit, and the period of the linear
congruential method. Generally the more numbers used in per time unit simulation,
the shorter the random number generator is, the more frequent of the problem caused
by this repeat phenomenon.)
6.
Advance simulation topics:
Antithetic random variables, Regenrative cycles shortens simulation runs. Common random
variables. Use of warmup period. Separate batches.
7. The Alias Method of Choosing Event Types (skim only)
[Bratley et al. 1987] in this part refers to Paul Bratley, Bennett L. Fox, Linus E. Schrage, A
guide to simulation, New York: Springer-Verlag, c1987, 2nd edition.
Another standard problem encountered in simulation is the generation of discrete
random variables according to arbitrary distributions. This can become time
consuming when the domain of the random variable is large. One efficient way of
obtaining a random variable distributed over the integers 1,2,...,n with probabilities
p(i), i=1,2,...,n, is the alias method [Bratley et al. 1987]. This method can be used to
further reduce the computation effort, e.g., in the standard clock simulation approach
in determining the event type at every transition instant (to be covered in later Lecture
Notes). The method requires only one uniformly distributed variable, one comparison,
and at most two memory references per sample. It is thus independent of the size of
the possible event list, an important advantage in the simulation of large systems via
the standard clock approach. However, this method requires pre-computing two
tables of length n, which is a one-time effort.
The alias method uses two tables, R(i) and A(i), i=1,2,...,n , 0R(i)and A(i) is
a mapping from the set {1,2,...,n} to itself. The description of the algorithm generating
the random variable with distribution p(i) below essentially follows [Bratley et al.
1987].
(1)
Generate a uniformly distributed random variable u [0, 1).
(2)
Let v=nu (v is uniform on [0, n)).
(3) Set I = [v]; I is the smallest integer which is bigger than v. (I is uniform on
integers 1,2,..,n.)
(4)
Set w=I-v (note that w is uniform on [0, 1) and independent of I).
(5)
If w  R(I), then output e = I; otherwise, output e= A(I).
Get I=1,…,n with uniform probability from I=[nu],
Copyright by Yu-Chi Ho
u  U (0,1] Get w  U (0,1] via w=I-v
Is w  R  I  ?
No
Accept A(I)
(get I with probability
Yes
1 R  j 
)
n
 j: A j  i 
Accept I with probability

R(I)/n<1/n
In the algorithm, we first generate a uniformly distributed integer I on 1,2,...,n; then
we adjust the probabilities by replacing the number I {1,2,...,n} by its "alias" A(I),
with a certain "aliasing" probability 1-R(I). If we choose the aliases and the aliasing
probabilities properly, then the random variable generated by this algorithm, e, has the
desired distribution. From the algorithm, we have
R i 
,
n
which says that the probability of getting “i” without aliasing is smaller than 1/n. On
the other hand,
P w  R  I , I  i 
1 R  j 
.
n
Summing the probabilities of the mutually exclusive ways to get e=i, we obtain
Pw  RI , I  j 
P e  i  
R i 
1 R  j 
 
,
n
n
 j: A j i
which provides a means of increasing the probability of getting “i” to above 1/n. Thus,
if we choose A(i) and R(i), i=1,2,...,n, such that the above quantity equals p(i), then the
random variable e has distribution p(i), i=1,2,...,n. it is worthwhile to note that the
above relation does not uniquely specify the values of A(i) and R(i). (two sets of
numbers to satisfy one set of equations) There may be many tables which can be
used as the aliases and aliasing probabilities. The following is an algorithm generating
a proper set of A(i) and R(i), i=1,2,...,n.
(1) Set H = , and L = . ( is the null set.)
(2) For i=1 to n:
(a) set R(i) = np(i);
(b) if R(i) > 1, then add i to the set H;
(c) if R(i) < 1, then add i to the set L.
(3) (a) if H = , stop;
Copyright by Yu-Chi Ho
(b) otherwise select an index j from L and an index k from H.
(4) (a) Set A(j) = k; (A(j) and R(j) are now finalized.)
(b) Set R(k) = R(k) + R(j) -1;
(c) if R(k)  1, remove k from H; (question: what if R(k)>1?)
(d) if R(k) < 1, add k to L;
(e) remove j from L.
(5) Go to step 3.
This algorithm runs in O(n) time because at each iteration the number of indices in the
union of H and L goes down by at least one. Also, the total “excess” probability in H
always equals the total “shortage” in L; i.e.,
  R  i   1   1  R  i .
iH
iL
This shows that in step 2(a) if H = , then L = . At the last iteration, there is just
one element in each of H and L. Hence R(j) -1 = 1- R(k) at step 3(b) and at step 4(b)
R(k) = 1; this leaves both H and L empty. The proof of the algorithm is left as an
exercise below. (Note R(I)=1 means there are no aliases for this I)
Exercise: (P.242) Prove that in the algorithm generating A(i) and R(i), we have
Copyright by Yu-Chi Ho
(a). If np(i) < 1, then {j: A(j)=i}=
(b). If np(i) > 1, then R(i)+  1  R  j  =np(i).
 j: A j  i 
Answer:
(a). Proof:
Going over the algorithm on p.240, we find the only chance that an
index i can be
added to H is at step (2)-(b).
So, if R(i)=np(i) < 1, then i will never be added to H.
Hence, if np(i) < 1, then {j: A(j)=i}=

(b). Proof:
Let's study the algorithm.
From step (2), we know if R(i)=np(i) > 1, then iH.
From step (4)-(a) and (4)-(b), we know that the value R(i) decreases to
R(i)+R(j)-1=np(i)+R(j)-1
By (4)-(c), we know this kind of j will leave L and this procedure
repeats until
R(i) ≤ 1
So we have
R(i)=np(i)+   R  j   1
 j: A j  i 
Hence
R(i)+

 j: A j  i 
1  R  j  =np(i)
(optional) Another alternative to Alias method is the METROPOLIS ALGORITHM for sampling
X-finite set,(x) - distribution. We want to pick x from  when it is known only proportionally.
Consider Markov Chain K(x,y) and ratio A(x,y) (y)K(y,x)/ (x)K(x,y). At x we pick y from K(x,y); If
A(x,y)≥1 go to y, else flip coin according to A(x,y) then go to y if head else stay at x.
Basis of Simulated Annealing.
Copyright by Yu-Chi Ho