Download Lecture 8

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Lecture 8
1
Poisson Limit Theorem
Lindeberg’s Theorem explains why the Gaussian distribution is ubiquitous: Whenever we take
the sum of a large number of independent random variables Sn := Xn,1 + Xn,2 + · · · + Xn,n ,
centered and normalized so that Sn has mean zero and variance 1, the distribution of Sn will
be close to the standard normal, provided that Xn,1 , . . . , Xn,n are uniformly small in the sense
that they satisfy Lindeberg’s condition:
∀ > 0,
The typical example is Xn,i
lim
n→∞
n
X
2
E[Xn,i
1{|Xn,i |>} ] = 0.
i=1
√
= Xi / n for a sequence of i.i.d. random variables (Xi )i∈N .
Exercise 1.1 Show that the following Lyapunov’s condition implies Lindeberg’s condition:
lim
n→∞
n
X
E[|Xn,i |p ] = 0
for some p > 2.
i=1
Another distribution, which is as ubiquitous as the Gaussian distribution, is the Poisson
distribution. Recall that the Poisson distribution with parameter λ > 0 is a probability
n
measure µ on the non-negative integers with µ(n) = e−λ λn! for n ∈ {0} ∪ N. It has mean λ,
it
variance also λ, and characteristic function φ(t) = e−λ(1−e ) . As the next theorem will show,
the Poisson distribution typically arises as the limit of the sum of indicators of independent
events, each occurring with a small probability.
Theorem 1.2 [Poisson Limit Theorem] Let (Xn,i )1≤i≤n be a triangular array of independent {0, 1}-valued random variables, with pn,i := P(Xn,i = 1) = 1 − P(Xn,i = 0). Suppose
P
P
that for some λ > 0, ni=1 pn,i → λ and max1≤i≤n pn,i → 0 as n → ∞, then Sn := ni=1 Xn,i
converges in distribution to a Poisson random variable Z with parameter λ.
Proof. It suffices to show that the characteristic function converges, i.e., E[eitSn ] → e−λ(1−e
as n → ∞. Note that by Taylor expansion,
log E[eitSn ] = log
n
Y
j=1
E[eitXn,j ] =
n
X
it )
n
X
log 1 − pn,j + pn,j eit =
− pn,j (1 − eit ) + O(p2n,j ) ,
j=1
j=1
which converges to −λ(1 − eit ) as n → ∞ by our assumptions on pn,j .
Theorem 1.2 can be readily extended to
Corollary 1.3 Let (Xn,i )1≤i≤n be a triangular array of independent N0 := {0} ∪ N-valued
random variables, with pn,i := P(Xn,i = 1) and n,i := P(Xn,i ≥ 2). Suppose that
(i)
Pn
i=1 pn,i
→ λ for some λ > 0,
1
(ii) max1≤i≤n pn,i → 0,
Pn
(iii)
i=1 n,i → 0 as n → ∞,
then Sn :=
Pn
i=1 Xn,i
converges in distribution to a Poisson random variable Z with mean λ.
Proof. Let X̃n,i := Xn,i 1{Xn,i ≤1} . Then (X̃n,i )1≤i≤n satisfies the conditions in Theorem 1.2,
P
and hence S̃n := ni=1 X̃n,i converges in distribution to a Poisson random variable Z with
parameter λ. On the other hand
P(Sn 6= S̃n ) = P
n
X
n
n
X
X
1{Xn,i ≥2} 6= 0 ≤
P(Xn,i ≥ 2) =
n,i −→ 0.
i=1
i=1
i=1
n→∞
Therefore Sn − S̃n converges in probability to 0, and hence Sn also converges in distribution
to Z.
Let us consider a few examples which explain why the Poisson distribution is universal.
Example 1.4 If there are 400 students in a class, then it is not unreasonable to model
their birthdays by i.i.d. random variables, each chosen uniformly among the 365 days of the
year. The expected number of students whose birthday fall on the day of the final exam is
λ := 400/365 ≈ 1.096, and the distribution of the number of such students is approximately
Poisson with mean 1.096. In particular, the probability of having no student with birthday
on the day of the final exam is approximately e−λ = e−1.096 ≈ 0.334.
Example 1.5 [Counting Rare Events] Let (Xi )i∈N be i.i.d. real-valued random variables
with distribution µ(dx) = f (x)dx for some density f . Let an be an increasing sequence chosen
such that P(X1 > an ) = λ/n. Then by Theorem 1.2, the number of Xi , 1 ≤ i ≤ n, which
exceeds an , converges in distribution to a Poisson random variable with parameter λ. The
general principle is that, among a large collection of independent events, if a notion of rare
event is defined in such a way that the expected number of rare events is of order 1, then the
number of rare events will follow approximately a Poisson distribution.
Example 1.6 [Customers Arriving in a Queue] Let N (s, t) be the number of customers
arriving in a queue (at a bank or store) during the time interval (s, t]. Let us make the
following assumptions:
(i) The number of arrivals in disjoint time intervals are independent,
(ii) the law of N (s, t) depends only on t − s,
(iii) P(N (0, h) = 1) = λh + o(h), i.e., limh↓0
(iv) P(N (0, h) ≥ 2) = o(h), i.e., limh↓0
P(N (0,h)=1)
h
P(N (0,h)≥2)
h
= λ,
= 0.
P
jt dist
Then for each t > 0, N (0, t) = nj=1 Xn,j with Xn,j := N (j−1)t
= N (0, t/n], which
n , n
Pn
satisfies the conditions in Corollary 1.3 with j=1 P(N (0, t/n] = 1) → λt, and hence N (0, t)
is distributed as a Poisson random variable with mean λt. In fact the family of random
variables (N (0, t))t≥0 defines a so-called Poisson process.
2
2
Poisson Process
Definition 2.1 [Poisson Process] A family of random variables (Nt )t≥0 is called a Poisson
process with rate λ, if
(i) N0 = 0,
(ii) For t0 = 0 < t1 < · · · < tm , (Ntk − Ntk−1 )1≤k≤m are independent random variables,
(iii) For any 0 ≤ s < t, Nt − Ns is a Poisson random variable with mean λ(t − s),
(iv) Almost surely, Nt is right continuous in t ≥ 0.
Conditions (i)–(iii) determine the finite-dimensional distributions of (Nt )t≥0 , and hence the
distribution of the joint realizations of Nt for all rational t; condition (iv) then uniquely
determines the definition of Nt for irrational t. An immediate consequence of the definition
of the Poisson process is that it is Markov, namely that for any t0 ≥ 0, (Nt0 +t − Nt0 )t≥0 is
again a Poisson process, which is in particular independent of (Nt )0≤t≤t0 .
We can interpret Nt as the number of customers that have arrived in a queue by time t.
We show next that almost surely, no two customers arrive at the same time. More precisely,
let Nt+ := limh↓0 Nt+h , and let Nt− = limh↓0 Nt−h . Then we claim that almost surely,
Nt+ − Nt− ∈ {0, 1} for all t ≥ 0. Indeed, for any deterministic s ≥ 0,
P(Ns+ − Ns− ≥ 1) ≤ lim P(Ns+h − Ns−h ≥ 1) = 0,
h↓0
and hence a.s., N(qt)+ − N(qt)− = 0 for all rational q. Therefore modulo a set of probability 0,
{Ns+ − Ns− ≥ 2 for some 0 ≤ s ≤ t} ⊂ {Njt/n − N(j−1)t/n ≥ 2 for some 1 ≤ j ≤ n}
∀ n ∈ N,
and hence
P(Ns+ − Ns− ≥ 2 for some 0 ≤ s ≤ t) ≤ P(Njt/n − N(j−1)t/n ≥ 2 for some 1 ≤ j ≤ n)
≤ nP(Nt/n − N0 ≥ 2)
∞
X
λk tk
= ne−λt/n
−→ 0.
k!nk n→∞
k=2
This shows that almost surely no two customers arrive at the same time. Therefore an
alternative way of characterizing the process (Nt )t≥0 is to identify the distribution of the set
of times s with Ns+ − Ns− = 1, i.e., the times at which a customer arrives. If ξ1 denotes the
arrival time of the first customer, then we note that P(ξ1 > t) = P(Nt = 0) = e−λt , which is
an exponential distribution with mean 1/λ. It is then natural to conjecture that the difference
between the consecutive arrival times of customers are i.i.d. exponentially distributed with
mean 1/λ. The next result shows this to be indeed the case.
Theorem 2.2 [Construction of a Poisson Process] Let (ξi )i∈N be i.i.d. exponential ranP
dom variables with mean 1/λ, i.e., P(ξ1 > t) = e−λt for t ≥ 0. Let T0 := 0 and Tn := ni=1 ξn
for n ∈ N, and let Nt := max{n ≥ 0 : Tn ≤ t}. Then (Nt )t≥0 is a rate λ Poisson process as
defined in Definition 2.1.
Proof. From the definition of Nt , it is clear that N0 = 0 and Nt is a.s. right continuous. Instead of verifying Definition 2.1 (ii) and (iii) through a direct calculation, we will approximate
3
Nt by a discrete time counting process, for which the two equivalent ways of characterizing
the process is self-evident, i.e., either characterized via the finite-dimensional distributions as
in Definition 2.1, or via the inter-arrival time of customers as in Theorem 2.2.
(n)
(n)
Given n ∈ N, let (Xi )i∈N be i.i.d. Bernoulli random variables with P(X1 = 1) =
P
(n)
(n)
(n)
(n)
1−P(X1 = 0) = λ/n, and Xi serves to approximate Ni/n −N(i−1)/n . Let Sj = ji=1 Xi ,
(n)
(n)
and for t ≥ 0, define Snt := Sbntc , which we will show to converge to Nt as n → ∞.
(n)
Let τ0
(n)
(τi
−
(n)
:= 0, and for k ∈ N, let τk
(n)
τi−1 )i∈N
(n)
(n)
:= min{i > τk−1 : Xi
= 1}. It is clear that
are i.i.d. geometrically distributed random variables with mean λ−1 n, and
(n)
P(τ1
λ j−1 λ = j) = 1 −
,
n
n
j ∈ N.
In particular, for any t > 0,
(n)
P(τ1
≥ nt) =
∞ X
1−
j=dnte
(n)
τi
λ j−1 λ λ dnte−1
= 1−
−→ e−λt = P(ξ1 ≥ t).
n→∞
n
n
n
(n)
−τi−1 n
i∈N
converge jointly in distribution to (ξi )i∈N . Using Skorohod’s
(n)
(n) τ −τ
representation theorem for weak convergence, we can couple i n i−1 i∈N and (ξi )i∈N on
In other words,
(n)
the same probability space such that a.s.
τi
(n)
−τi−1
n
→ ξi for each i ∈ N. For any fixed
(n)
(n)
0 = t0 < t1 < · · · < tm , by our construction of Nt and Snt , it then follows that Sntk → Ntk
(n)
(n)
a.s. for each 1 ≤ k ≤ m, and hence (Sntk − Sntk−1 )1≤k≤m converges in joint distribution to
(Ntk − Ntk−1 )1≤k≤m .
On the other hand, given 0 = t0 < t1 < · · · < tm , when n is sufficiently large, we
(n)
(n)
have bntk−1 c < bntk c for all 1 ≤ k ≤ m, which makes (Sntk − Sntk−1 )1≤k≤m independent
random variables, and hence their joint weak limit (Ntk − Ntk−1 )1≤k≤m are also independent
Pbntc
(n)
(n)
(n)
random variables. Furthermore, for each s < t, Snt − Sns = i=bnsc+1 Xi converges in
distribution to a Poisson random variable with mean λ(t − s) by Theorem 1.2, which must be
the distribution of Nt − Ns . Therefore (Nt )t≥0 satisfies the conditions in Definition 2.1 and is
a Poisson process.
Exercise 2.3 The fact that for any t0 > 0, (Nt0 +t − Nt0 )t≥0 is a Poisson process independent
of (Nt )0≤t≤t0 , and the construction of the Poisson process (Nt )t≥0 from i.i.d. exponential
random variables as done in Theorem 2.2 implicitly shows that the exponential distribution
has a memoryless property. Namely, if ξ is an exponential random variable with mean 1/λ,
then P(ξ > t + s|ξ > t) = P(ξ > s) for all s, t > 0. Prove this fact directly.
Exercise 2.4 If ξ1 and ξ2 are two independent exponential random variables with mean 1/λ1
and 1/λ2 respectively, then prove that ξ := min{ξ1 , ξ2 } is also an exponential random variable,
with mean 1/(λ1 + λ2 ).
The memoryless property of the exponential distribution makes it an essential tool in the
construction of continuous time Markov processes with a discrete state space. The Markov
property states that given the present state of the process, the law of the future and the past
are independent. It is then necessary that the time the process has to wait before jumping
away from its present location is exponentially distributed.
4
3
Poisson Point Process
An alternative way to think about a Poisson process (Nt )t≥0 is to identify it with a locally
finite measure Ξ on [0, ∞), with
X
Ξ(dx) :=
δτi (dx),
i∈N
where τ1 < τ2 < · · · are the times at which Nt makes a jump, and δz (dx) denotes a delta
measure at position z. Such an interpretation allows a natural extension of a Poisson process,
which can be regarded as a random measure on [0, ∞), to random meausres on more general
spaces (in particular, Polish spaces) called Poisson Point Processes.
We state below a theorem on the characterization of a general Poisson point process on a
locally compact Polish space, where local compactness refers to the property that any x ∈ S
is contained in some open set U , whose closure is compact. The notions we will need is M(S),
the space of Radon measures on (S, S), i.e., if µ ∈ M(S), then µ(K) < ∞ for all compact
v
sets K. The topology on M(S) is the so-called vague topology,
i.e., Rµn ⇒ µ in M(S) if and
R
only if for all continuous f : S → R with compact support, f dµn → f dµ. We remark that
the difference between weak convergence and vague convergence is that, M(S) may admit
v
infinite measures, and even if µn , µ ∈ M(S) are finite measures and µn ⇒ µ, mass of µn
may escape to infinity so that µ(S) < lim inf n→∞ µn (S), which is not possible under weak
convergence.
Theorem 3.1 [Poisson Point Processes (PPP)] Let S be a locally compact Polish space
with Borel σ-algebra S. For any µ ∈ M(S) (called mean measure or intensity measure), there
is a unique (in law) M(S)-valued random variable Ξ, called a Poisson Point Process with
mean measure µ, which satisfies the following properties:
(i) For any disjoint relatively compact sets A1 , . . . , An ∈ S, (Ξ(Ai ))1≤i≤n are independent.
(ii) For each relatively compact A ∈ S, Ξ(A) is a Poisson random variable with mean µ(A).
Note that when S = Rd , all bounded sets are relatively compact, and hence it suffices to
consider bounded Borel measurable sets in Theorem 3.1 (i) and (ii). We call Ξ a point process
because almost surely, Ξ is a collection of delta measures with positive integer mass, i.e., for
any A ∈ S, Ξ(A) ∈ N ∪ {0, ∞}. For a proof of Theorem 3.1, see e.g. [3, Chapter 24].
Remark 3.2 [Non-locally Compact Polish Spaces] If S is only assumed to be a Polish
space, not necessarily locally compact, then we need to change M(S) from the space of Radon
measures to the space M# (S) of locally finite measures, i.e., µ(B) < ∞ for any bounded open
ball B with a finite radius. The vague topology on M(S) should be replaced by the so-called
R
R
w#
w# (weak-hash) topology on M# (S), where µn ⇒ µ if f dµn → f dµ for all bounded
continuous f : S → R with bounded support (see [1, Chapter A 2.6]). Standard references on
random measures consider random Radon measures on a locally compact Polish space. For a
general reference on M# (S)-valued random variables, see [1, 2].
Poisson point processes provide the fundamental building block in the construction of many
stochastic objects, including infinitely divisible distributions, Lévy processes, excursions of a
Markov process from a point, extreme order statistics, etc. We consider here the simplest
example.
5
Example 3.3 [Compound Poisson Distribution] Let Ξ be a PPP on [0, ∞) × R with
mean measure λdt × µ(dx) for a probability measure µ on R. Note that Nt := Ξ([0, t] × R) is
in fact a Poisson process with rate λ. Let τ1 < τ2 < · · · denote the successive times at which
Nt makes a jump. Then Ξ = {(τi , ξi ) : i ∈ N} for some ξi ∈ R. Conditioned on the realization
of τ1 < τ2 < · · · , it can be seen that (ξi )i∈N are i.i.d. random variables with distribution µ.
Thus we obtain a representation for the compound Poisson random variable
X
ξi ,
X :=
i:τi ≤1
in terms of the underlying PPP Ξ. When µ is the delta measure at 1, X is just a Poisson
random variable with mean λ. Analogous to the definition of a Poisson process, if we define
X
ξi ,
(3.1)
Xt :=
i:τi ≤t
then (Xt )t≥0 is a compound Poisson process.
When µ is a finite measure with total mass A > 0, we can rewrite the mean measure
λdt × µ(dx) as Aλdt × µ̃(dx), with µ̃(dx) = A−1 µ(dx) being the normalized probability
measure. We are thus still in the compound Poisson setting. Interestingly, for a large family
of infinite measures µ, it is still possible to define Xt in (3.1), even though the sum is over
infinitely many terms for each t > 0. Together with Brownian motion, such Poisson point
process constructions are the building blocks in the construction of Lévy processes, which
are defined to be Rd -valued stochastic processes (Xt )t≥0 with independent and stationary
increments, i.e., for 0 = t0 < t1 < · · · < tn , (Xti − Xti−1 )1≤i≤n are independent, and the
distribution of Xti − Xti−1 depends only on ti − ti−1 .
References
[1] D.J. Daley and D. Vere-Jones. An Introduction to the Theory of Point Processes, Volume
I: Elementary Theory and Methods, 2nd edition, Springer, 2003.
[2] D.J. Daley and D. Vere-Jones. An Introduction to the Theory of Point Processes, Volume
II: General Theory and Structure, 2nd edition, Springer, 2008.
[3] A. Klenke. Probability Theory–A Comprehensive Course, Springer-Verlag.
6