Survey

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
```Math 345 - Stochastic processes - Spring 2020
1
Bernoulli processes
1.1
Random processes
Definition 1.1. A random or stochastic process is an infinite collection of rv’s defined on a
common probability model. If the process contains countably many rv’s, then they can be indexed
by positive integers, X1 , X2 , . . . , and the process is called a discrete-time random process. If there
are continuum many rv’s, then they can be indexed by a positive real number t, {Xt ; t ≥ 0}, and
the process is called a continuous-time random process.
A discrete-time random process assigns a sequence of numbers to every outcome ω ∈ Ω,
ω 7→ (X1 (ω), X2 (ω), . . . , Xn (ω), . . . ),
while a continuous-time discrete process assigns a function defined on the half-line [0, +∞),
ω 7→ Xt (ω) : [0, ∞) → R.
Of course the sequence can be thought of as a function defined on the set of natural numbers N.
The value of the random process on a particular outcome ω is called a sample path of the process.
When studying the random process one may choose to abstract to a probability model in which
the outcomes are the sample paths, and events are sets of sample paths.
Some examples of processes that can be modeled by random processes are repeated experiments,
arrivals or departures (of customers, orders, signals, packets, etc.) and random walks (over a line,
in a plane, in a 3D space).
1.2
Bernoulli processes
One can make a simple nontrivial random process by considering a sequence of IID binary rv’s.
Definition 1.2. A Bernoulli process is a sequence Z1 , Z2 , . . . of IID binary rv’s. The independence
here is understood in the sense that for any n > 0, the rv’s {Z1 , Z2 , . . . , Zn } are independent.
Let p = Pr({Z = 1}) and q = 1 − p = Pr({Z = 0}). One can think of the Bernoulli process
as a model describing the arrival of customers at discrete times n = 1, 2, . . . . Then at a specific
time n = j a customer will arrive with probability p (Zj = 1), and no customer will arrive with
probability q (Zj = 0). Here we are assuming that at most one arrival occurs at each discrete time
instance. Instead of tracking the arrivals of the customers, one can track the interarrival times in
this process. The first interarrival time X1 will be the time it takes the first customer to arrive. So


1
if Z1 = 1; prob = p




2
if Z1 = 0, Z2 = 1; prob = p(1 − p)



3
if Z1 = 0, Z2 = 0, Z3 = 1; prob = p(1 − p)2
X1 =

...





m if Z1 = 0, . . . , Zm−1 = 0, Zm = 1; prob = p(1 − p)m−1



. . .
As we see, X1 has the geometric PMF
PX1 (m) = p(1 − p)m−1 ,
m ≥ 1.
The second interarrival time X2 is the time between the first and second arrivals, and similarly
we define Xj to be the interarrival time between the (j − 1)th and j th arrivals. As the arrival rv’s
are IID, clearly X2 will have the same probability distribution as X1 , since the entire process can be
chopped off at the first arrival time, and the second interarrival time can be computed from there
similar to the above. Using induction to generalize this argument, we see that the interarrival times
are IID geometric rv’s X1 , X2 , . . . , and the Bernoulli process can be caracterized by this sequence
instead of the sequence of binary IID arrival rv’s {Zj }∞
j=1 . An example of a sample path of the
Bernoulli process in terms of the arrival rv’s and the interarrival times is given below
{Zj } : 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, . . .
{Xj } : 2, 1, 3, 2, 4, 1, 1, . . .
(1)
In addition to the arrival rv’s Zj and the interarrival time rv’s Xj , one may be also interested
in tracking the aggregate number of arrivals up to the time t = n. These aggregate numbers of
arrivals will be given by the rv’s
Sn = Z1 + Z2 + · · · + Zn ,
n ≥ 1.
Sn takes positive integer values, and Sn = k means that in the first n discrete time instances arrivals
occurred in exactly k of them. So the probability of Sn = k will be
n k
Pr({Sn = k}) = PSn (k) =
p (1 − p)n−k ,
k = 1, 2, . . . , n,
k
where the combination nk given the number of ways in which k time instances can be chosen from
n, pk is the probability that in those k instances arrivals occur, while (1 − p)n−k is the probability
that in the remaining n − k instances no arrival occurs. Here we are, of course, relying on the
independence of the arrival rv’s Zj ; j = 1, 2, . . . . So the aggregate number of arrivals has the
biniomial distribution.
The rv’s {Sn , n ≥ 1} having binomial distribution is not enough for the process with these aggregate arrivals to be a Bernoulli process, since the arrival rv’s {Zj , j ≥ 1} may not be independent.
Indeed, one can for example give a joint PMF for {Z1 , Z2 , Z3 } that yields them not independent,
while making Sj , j = 1, 2, 3 binomial. Then taking {Zj , j ≥ 4} to be IID will guarantee that the
rest of the aggregate arrival rv’s are binomial as well, but the independence of the Zj ’s will no
longer hold. For a particular example, let PZ (1) = PZ (0) = 1/2, and define the joint PMF PZ1 Z2 Z3
as follows
PZ1 Z2 Z3 (0, 0, 0) = PZ1 Z2 Z3 (1, 1, 1) = PZ1 Z2 Z3 (0, 0, 1) = PZ1 Z2 Z3 (1, 1, 0) = 1/8,
PZ1 Z2 Z3 (1, 1, 0) = PZ1 Z2 Z3 (0, 0, 1) = 1/4,
PZ1 Z2 Z3 (1, 0, 0) = PZ1 Z2 Z3 (0, 1, 1) = 0.
Then one can compute the PMF’s for Sj , j = 1, 2, 3, to see that they are binomial, but obviously
{Z1 , Z2 , Z3 } will not be independent.
1.3
Asymptotics of the binomial distribution
As we saw above, the sum of IID binary distributions has a binomial PMF. We can use this known
PMF of a sum of IID rv’s as an illustrative example of the Chernoff bound that we discussed
earlier. But first, we give the asymptotics of the binomial PMF based on the Stirling’s bounds for
the factorial n!,
n n
n n 1
√
√
2πn
≤ n! ≤ 2πn
e 12n .
e
e
2
To state the asymptotics for the binomial PMF, we first denote pe = k/n, which will be interpreted as the relative frequency of 1’s in the n-tupple Z1 , Z2 , . . . , Zn . We also define the KullbackLiebler divergence (relative entropy) by
pe
1 − pe
D(e
p || p) = pe ln
+ (1 − pe) ln
.
p
1−p
If we treat D(e
p || p) as a function of pe with a fixed p, then observe that D(p || p) = 0 =
d
de
p D(p, p)
= 0,
d2
D(p, p)
de
p2
while
> 0. So D(e
p || p) is a concave up function that takes its minimum value at pe = p,
thus, D(e
p || p) ≥ 0, and the equality holds only when pe = p. The Kullback-Liebler divergence can
be thought of as a measure of how different p and pe are. With this notation, we have the following
bounds.
Theorem 1.3. Let PSn (k) be the PMF of the binomial distribution for an underlying binary PMF
PZ (1) = p > 0, PZ (0) = 1 − p > 0. Then for each integer pen, 1 ≤ pen ≤ n − 1,
s
1
e−nD(ep || p) ,
Psn (e
pn) <
2πne
p(1 − pe)
s
1
1
Psn (e
pn) > 1 −
e−nD(ep || p) .
12ne
p(1 − pe)
2πne
p(1 − pe)
Notice that for fixed p, pe the lower and upper bounds in the theorem are asymptotically the
same as n → ∞, so the bounds are asymptotically tight, and as n → ∞, we have
s
1
e−nD(ep || p) , as n → ∞,
uniformly for 0 < pe < 1.
(2)
PSn (e
pn) ∼
2πne
p(1 − pe)
The uniformness of the bounds is important here, since one cannot choose any pe for an arbitrary
n, but needs to make sure pen is an integer, since Sn is integer-valued. The bound itself says that
if pe 6= p, then the probability of having a relative frequency pe of arrivals within n discrete time
instances (or, equivalently, having k = pen arrivals) decreases exponentially in n.
Let us now apply the Chernoff bound to the Binomial rv as the sum of n IID binary rv’s. Notice
that for the IID binary rv’s Z1 , Z2 , . . . with the PMF PZ (1) = p, PZ (0) = q = 1 − p defined above,
the mean is E[Z] = p and the MGF is
gZ (r) = E[eZr ] = pe1·r + qe0·r = q + per ,
for − ∞ < r < ∞.
The semi-invariant MGF will then be γX (r) = ln(q + per ). If we take a = pe in the Chernoff bound,
then the optimal exponent for this value will be
µX (e
p) = inf [γX (r) − per].
r≥0
But the minimum of the expression γX (r) − per will be assumed when
d
γX (r) = pe.
dr
This will happen if
per
= pe
q + per
or
er =
3
peq
,
pe
q
where qe = 1 − pe.
Observe that for pe > p, for which we expect the Chernoff bound to hold, one has for the optimal r,
er =
peq
pe(1 − p)
=
> 1,
pe
q
p(1 − pe)
hence r > 0,
as expected. Substituting this value of er back into µX (r) and performing some algebraic simplifications, one will get
q
p
+ qe ln
= −D(e
p || p).
µX (r) = pe ln
pe
qe
Substituting this into the Chernoff bound, we get
Pr({Sn ≥ ne
p}) ≤ enµX (r) = e−nD(ep || p) .
But Pr({Sn ≥ ne
p}) ≥ Pr({Sn = ne
p}), and comparing the Chernoff bound to the above asymptotic
bounds for the binomial PMF, we see that Pr({Sn ≥ ne
p}) will be bounded by the same exponentially
decaying bound from above and below. We can record this fact by
lim
n→∞
ln Pr({Sn ≥ ne
p})
= −D(e
p || p),
n
pe > p.
So the optimized Chernoff bound (over r ∈ I(X), r > 0) is exponentially tight for sums of binary
IID rv’s, in the sense that no better (faster decaying) exponential bound can exist. This turns out
to be true for any general IID rv’s.
1.4
Central Limit Theorem for binary IID rv’s
Using the asymptotic bounds discussed in the previous section one can prove the Central Limit
Theorem (CLT) in the case of binary IID rv’s. Recall that the CLT says that the renormalized
sample average of IID rv’s converges to the Gaussian rv in distribution. In the case of binary IID
rv’s one can restate the CLT in the following form.
Theorem 1.4. Let {Zj , j ≥ 1} be a sequence of binary IID rv’s with p = PZ (1) > 0 and q =
1 − p = PZ (0) > 0. Let Sn = Z1 + · · · + Zn for any n ≥ 1 and α be a fixed constant satisfying
2
1
α
2 < α < 3 . Then there are constants C, n0 such that for any k satisfying |k − np| ≤ n one has
PSn (k) =
2
1 ± Cn3α−2 − (k−np)
√
e 2npq ,
2πnpq
for n ≥ n0 ,
where the equality is to be interpreted as an upper/lower bound depending on the sign in the ±.
Note that n3α−2 → 0 as n → ∞ for α < 23 , so the ratio of the upper and lower bounds approaches
1 with rate n3α−2 uniformly in k in the range |k − np| ≤ nα .
The above theorem is for the PMF of Sn and not the PDF, as the CLT requires, but using the
bounds established by the theorem one can show via a Riemann sum approximation to the integral
that
ˆ z
y2
Sn − np
1
0
√ e− 2 dy,
lim Pr
z ≤ √ √ ≤z
=
n→∞
pq n
2π
z0
which will finish the proof of the CLT for binary IID rv’s.
4
```
Related documents