Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Math 345 - Stochastic processes - Spring 2020 1 Bernoulli processes 1.1 Random processes Definition 1.1. A random or stochastic process is an infinite collection of rv’s defined on a common probability model. If the process contains countably many rv’s, then they can be indexed by positive integers, X1 , X2 , . . . , and the process is called a discrete-time random process. If there are continuum many rv’s, then they can be indexed by a positive real number t, {Xt ; t ≥ 0}, and the process is called a continuous-time random process. A discrete-time random process assigns a sequence of numbers to every outcome ω ∈ Ω, ω 7→ (X1 (ω), X2 (ω), . . . , Xn (ω), . . . ), while a continuous-time discrete process assigns a function defined on the half-line [0, +∞), ω 7→ Xt (ω) : [0, ∞) → R. Of course the sequence can be thought of as a function defined on the set of natural numbers N. The value of the random process on a particular outcome ω is called a sample path of the process. When studying the random process one may choose to abstract to a probability model in which the outcomes are the sample paths, and events are sets of sample paths. Some examples of processes that can be modeled by random processes are repeated experiments, arrivals or departures (of customers, orders, signals, packets, etc.) and random walks (over a line, in a plane, in a 3D space). 1.2 Bernoulli processes One can make a simple nontrivial random process by considering a sequence of IID binary rv’s. Definition 1.2. A Bernoulli process is a sequence Z1 , Z2 , . . . of IID binary rv’s. The independence here is understood in the sense that for any n > 0, the rv’s {Z1 , Z2 , . . . , Zn } are independent. Let p = Pr({Z = 1}) and q = 1 − p = Pr({Z = 0}). One can think of the Bernoulli process as a model describing the arrival of customers at discrete times n = 1, 2, . . . . Then at a specific time n = j a customer will arrive with probability p (Zj = 1), and no customer will arrive with probability q (Zj = 0). Here we are assuming that at most one arrival occurs at each discrete time instance. Instead of tracking the arrivals of the customers, one can track the interarrival times in this process. The first interarrival time X1 will be the time it takes the first customer to arrive. So 1 if Z1 = 1; prob = p 2 if Z1 = 0, Z2 = 1; prob = p(1 − p) 3 if Z1 = 0, Z2 = 0, Z3 = 1; prob = p(1 − p)2 X1 = ... m if Z1 = 0, . . . , Zm−1 = 0, Zm = 1; prob = p(1 − p)m−1 . . . As we see, X1 has the geometric PMF PX1 (m) = p(1 − p)m−1 , m ≥ 1. The second interarrival time X2 is the time between the first and second arrivals, and similarly we define Xj to be the interarrival time between the (j − 1)th and j th arrivals. As the arrival rv’s are IID, clearly X2 will have the same probability distribution as X1 , since the entire process can be chopped off at the first arrival time, and the second interarrival time can be computed from there similar to the above. Using induction to generalize this argument, we see that the interarrival times are IID geometric rv’s X1 , X2 , . . . , and the Bernoulli process can be caracterized by this sequence instead of the sequence of binary IID arrival rv’s {Zj }∞ j=1 . An example of a sample path of the Bernoulli process in terms of the arrival rv’s and the interarrival times is given below {Zj } : 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, . . . {Xj } : 2, 1, 3, 2, 4, 1, 1, . . . (1) In addition to the arrival rv’s Zj and the interarrival time rv’s Xj , one may be also interested in tracking the aggregate number of arrivals up to the time t = n. These aggregate numbers of arrivals will be given by the rv’s Sn = Z1 + Z2 + · · · + Zn , n ≥ 1. Sn takes positive integer values, and Sn = k means that in the first n discrete time instances arrivals occurred in exactly k of them. So the probability of Sn = k will be n k Pr({Sn = k}) = PSn (k) = p (1 − p)n−k , k = 1, 2, . . . , n, k where the combination nk given the number of ways in which k time instances can be chosen from n, pk is the probability that in those k instances arrivals occur, while (1 − p)n−k is the probability that in the remaining n − k instances no arrival occurs. Here we are, of course, relying on the independence of the arrival rv’s Zj ; j = 1, 2, . . . . So the aggregate number of arrivals has the biniomial distribution. The rv’s {Sn , n ≥ 1} having binomial distribution is not enough for the process with these aggregate arrivals to be a Bernoulli process, since the arrival rv’s {Zj , j ≥ 1} may not be independent. Indeed, one can for example give a joint PMF for {Z1 , Z2 , Z3 } that yields them not independent, while making Sj , j = 1, 2, 3 binomial. Then taking {Zj , j ≥ 4} to be IID will guarantee that the rest of the aggregate arrival rv’s are binomial as well, but the independence of the Zj ’s will no longer hold. For a particular example, let PZ (1) = PZ (0) = 1/2, and define the joint PMF PZ1 Z2 Z3 as follows PZ1 Z2 Z3 (0, 0, 0) = PZ1 Z2 Z3 (1, 1, 1) = PZ1 Z2 Z3 (0, 0, 1) = PZ1 Z2 Z3 (1, 1, 0) = 1/8, PZ1 Z2 Z3 (1, 1, 0) = PZ1 Z2 Z3 (0, 0, 1) = 1/4, PZ1 Z2 Z3 (1, 0, 0) = PZ1 Z2 Z3 (0, 1, 1) = 0. Then one can compute the PMF’s for Sj , j = 1, 2, 3, to see that they are binomial, but obviously {Z1 , Z2 , Z3 } will not be independent. 1.3 Asymptotics of the binomial distribution As we saw above, the sum of IID binary distributions has a binomial PMF. We can use this known PMF of a sum of IID rv’s as an illustrative example of the Chernoff bound that we discussed earlier. But first, we give the asymptotics of the binomial PMF based on the Stirling’s bounds for the factorial n!, n n n n 1 √ √ 2πn ≤ n! ≤ 2πn e 12n . e e 2 To state the asymptotics for the binomial PMF, we first denote pe = k/n, which will be interpreted as the relative frequency of 1’s in the n-tupple Z1 , Z2 , . . . , Zn . We also define the KullbackLiebler divergence (relative entropy) by pe 1 − pe D(e p || p) = pe ln + (1 − pe) ln . p 1−p If we treat D(e p || p) as a function of pe with a fixed p, then observe that D(p || p) = 0 = d de p D(p, p) = 0, d2 D(p, p) de p2 while > 0. So D(e p || p) is a concave up function that takes its minimum value at pe = p, thus, D(e p || p) ≥ 0, and the equality holds only when pe = p. The Kullback-Liebler divergence can be thought of as a measure of how different p and pe are. With this notation, we have the following bounds. Theorem 1.3. Let PSn (k) be the PMF of the binomial distribution for an underlying binary PMF PZ (1) = p > 0, PZ (0) = 1 − p > 0. Then for each integer pen, 1 ≤ pen ≤ n − 1, s 1 e−nD(ep || p) , Psn (e pn) < 2πne p(1 − pe) s 1 1 Psn (e pn) > 1 − e−nD(ep || p) . 12ne p(1 − pe) 2πne p(1 − pe) Notice that for fixed p, pe the lower and upper bounds in the theorem are asymptotically the same as n → ∞, so the bounds are asymptotically tight, and as n → ∞, we have s 1 e−nD(ep || p) , as n → ∞, uniformly for 0 < pe < 1. (2) PSn (e pn) ∼ 2πne p(1 − pe) The uniformness of the bounds is important here, since one cannot choose any pe for an arbitrary n, but needs to make sure pen is an integer, since Sn is integer-valued. The bound itself says that if pe 6= p, then the probability of having a relative frequency pe of arrivals within n discrete time instances (or, equivalently, having k = pen arrivals) decreases exponentially in n. Let us now apply the Chernoff bound to the Binomial rv as the sum of n IID binary rv’s. Notice that for the IID binary rv’s Z1 , Z2 , . . . with the PMF PZ (1) = p, PZ (0) = q = 1 − p defined above, the mean is E[Z] = p and the MGF is gZ (r) = E[eZr ] = pe1·r + qe0·r = q + per , for − ∞ < r < ∞. The semi-invariant MGF will then be γX (r) = ln(q + per ). If we take a = pe in the Chernoff bound, then the optimal exponent for this value will be µX (e p) = inf [γX (r) − per]. r≥0 But the minimum of the expression γX (r) − per will be assumed when d γX (r) = pe. dr This will happen if per = pe q + per or er = 3 peq , pe q where qe = 1 − pe. Observe that for pe > p, for which we expect the Chernoff bound to hold, one has for the optimal r, er = peq pe(1 − p) = > 1, pe q p(1 − pe) hence r > 0, as expected. Substituting this value of er back into µX (r) and performing some algebraic simplifications, one will get q p + qe ln = −D(e p || p). µX (r) = pe ln pe qe Substituting this into the Chernoff bound, we get Pr({Sn ≥ ne p}) ≤ enµX (r) = e−nD(ep || p) . But Pr({Sn ≥ ne p}) ≥ Pr({Sn = ne p}), and comparing the Chernoff bound to the above asymptotic bounds for the binomial PMF, we see that Pr({Sn ≥ ne p}) will be bounded by the same exponentially decaying bound from above and below. We can record this fact by lim n→∞ ln Pr({Sn ≥ ne p}) = −D(e p || p), n pe > p. So the optimized Chernoff bound (over r ∈ I(X), r > 0) is exponentially tight for sums of binary IID rv’s, in the sense that no better (faster decaying) exponential bound can exist. This turns out to be true for any general IID rv’s. 1.4 Central Limit Theorem for binary IID rv’s Using the asymptotic bounds discussed in the previous section one can prove the Central Limit Theorem (CLT) in the case of binary IID rv’s. Recall that the CLT says that the renormalized sample average of IID rv’s converges to the Gaussian rv in distribution. In the case of binary IID rv’s one can restate the CLT in the following form. Theorem 1.4. Let {Zj , j ≥ 1} be a sequence of binary IID rv’s with p = PZ (1) > 0 and q = 1 − p = PZ (0) > 0. Let Sn = Z1 + · · · + Zn for any n ≥ 1 and α be a fixed constant satisfying 2 1 α 2 < α < 3 . Then there are constants C, n0 such that for any k satisfying |k − np| ≤ n one has PSn (k) = 2 1 ± Cn3α−2 − (k−np) √ e 2npq , 2πnpq for n ≥ n0 , where the equality is to be interpreted as an upper/lower bound depending on the sign in the ±. Note that n3α−2 → 0 as n → ∞ for α < 23 , so the ratio of the upper and lower bounds approaches 1 with rate n3α−2 uniformly in k in the range |k − np| ≤ nα . The above theorem is for the PMF of Sn and not the PDF, as the CLT requires, but using the bounds established by the theorem one can show via a Riemann sum approximation to the integral that ˆ z y2 Sn − np 1 0 √ e− 2 dy, lim Pr z ≤ √ √ ≤z = n→∞ pq n 2π z0 which will finish the proof of the CLT for binary IID rv’s. 4