Download hku m01

Document related concepts

Indeterminism wikipedia , lookup

Infinite monkey theorem wikipedia , lookup

Inductive probability wikipedia , lookup

Stochastic geometry models of wireless networks wikipedia , lookup

Randomness wikipedia , lookup

Birthday problem wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability interpretations wikipedia , lookup

Conditioning (probability) wikipedia , lookup

Law of large numbers wikipedia , lookup

Transcript
The Outline
Part I aims at providing the fundamental knowledge in probability theory, Poisson process and Markov chain theory. Some applications such as inventory systems and PageRank algorithm are
discussed.
(1) Probability Theory, Poisson Process and Inventory System
(2) Discrete Time Markov Chains : Theory and Examples
Part II discusses a continuous time stochastic process, the Birth
and Death process and its relation to Markovian queueing systems.
Applications of Markovian queueing systems will also be discussed.
(3) Continuous Time Markov Chains : Theory and Examples
(4) Introduction to Queueing Systems
http://hkumath.hku.hk/∼wkc/course/part1.pdf
1
In Part III, we introduce iterative methods (computational methods) for solving a system of linear equations. It is important for
solving the steady-state distribution of a queueing network.
(5) Computation with Markov Chains: Iterative Methods
(6) Markovian Queueing Networks, Manufacturing and Re-manufacturing
Systems
Finally in Part IV, we introduce four research topics related to
Markov chain models.
(7) Hidden Markov Models (HMMs) and Their Applications
(8) Multivariate and High-order Markov Chain Models with Applications
(9) Introduction to Credit Risk Models
(10) Probabilistic Boolean Networks: Construction & Application
2
What is Engineering?
• Engineering combines the fields of science (Physics)
and mathematics to solve real world problems that improve the world around us.
• What really distinguishes an engineer is his ability to implement
ideas in a cost effective and practical approach.
• The ability to take a thought, or abstract idea, and translate it
into reality is what separates an engineer from other fields of science
and mathematics.
[Taken from http://whatisengineering.com/]
3
Mathematicsisaboutproof.
Proof
䅹᰾
䀰 ⲫ ᰕ ᴸ
Words
Goupto
Sun&Mon
Figure 1: What is Mathematics?
What is Mathematics?
• Mathematics is a language. Language is the dress of thought
(Samuel Johnson). Moreover, the limits of my language are the limits
of my world (Ludwig Wittgenstein).
4
Transferable Skills
• Skills that can be used in a variety of jobs or occupations.
• Transferable skills are the inventory of assets that help you transition into and excel in a new role. They ensure the robustness
and longevity of your career. Moreover, they allow you to more
easily and readily explore lateral dimensions in your career and
acquire added skills and expertise.
• While highly specialized skills may be essential to building
your own personal competitive advantage and ensuring success in a
particular role or organization, it is the transferable skills
that ensure you DO NOT become professionally redundant
or obsolete over the long term.
[Taken from http://www.bayt.com/en/career-article-2721/]
5
Transferable Mathematical Knowledge
• Stochastic Processes: Markov Chain Process, Poisson
Process, Birth-and-Death Process, Diffusion Process, Levy
Process, etc.
• Mathematical Models: Deterministic Models, Stochastic Models, Statistical Models, etc.
• Computational Methods: Iterative Methods, Numerical
Algorithms, Computer Simulations, etc.
• Data Mining Techniques: Data Modeling, Classifications, Clustering Methods, etc.
6
PART I
(1) Probability Theory, Poisson Process and Inventory Systems
-Exponential Distribution and Poisson Distribution
-Poisson Process
-Introduction to Inventory Models
(2) Discrete Time Markov Chain Models: Theory and Examples
-Introduction to Markov Chain
-Irreducible Markov Chain and Classifications of States
-Simulation and Construction of Markov Chains
-Stationary Distribution of a Finite Markov Chain
-More Markov Chain Models
We are to admit no more causes of natural things than such as are
both true and sufficient to explain their appearances.
Isaac Newton (Wikipedia)
7
Figure 2: St Augustine and Monica by Ary Scheffer (1846). Taken from Wikipedia.
1
Probability Theory, Poisson Process and Inventory Systems
1.1
The Science of Probability: A Historical Note
• With the advent of Christianity, the concept of random events developed by
philosophers was rejected in the early time.
• According to St. Augustine (354-430), nothing occurred by chance, everything
being minutely controlled by the will of God. If events appear to occur at random,
then it is because of our ignorance and not in the nature of events. One should only
seek for the will of God instead of looking at patterns of behavior in aggregates
of events. 1
1
Poker faces: the life and work of professional card players by David M. Hayano, UCP Press, 1982.
8
• The amazing contents and applications of probability theory owes its origin to a
question on gambling (game).
• The question was raised by Chevalier de Mere (1607-1684) on his problem
of throwing a die. He had a title Chevalier (Knight) and educated at Mere. The
problem was solved by Pascal.
The Problem:
• De Mere made considerable money over the years in betting double odds on
rolling at least one “6” in 4 throws of a fair die (six faces).
• He then thought that the same should occur for betting on at least one doublesix in 24 throws of two fair dice (This is their ancient believes). It turned
out that it did not work well.
• Why? In 1654, he challenged his friends Pascal and Fermat for reasons.
9
Figure 3: Pascal (1623-1662) (Left).
Fermat (1601-1665) (Right). Taken from Wikipedia.
10
• The probability of getting no “6” in four independent throws of a fair die:
625
.
1296
• Therefore the probability of having at least one “6” in 4 throws will be equal to
(5/6) × (5/6) × (5/6) × (5/6) =
671
625
=
= 0.5177 > 0.5000.
1296 1296
• This explained why de Mere got a good amount of money on double odds on his
bet.
1−
• This is not a fair game, the player has advantage over the house.
11
• The probability of getting no double “6”in the throw of two fair dice is
(
)
1 1
35
1−
×
= .
6 6
36
• The probability of getting no double “6” in “24” independent throws is
( )24
35
.
36
• Therefore the probability of having at least one double 6 in 24 throws is equal to
( )24
35
1−
= 0.4914 < 0.5.
36
• This explained why de Mere did not get a good amount of money on double odds
on this bet.
• Again this is not a fair game, the house has advantage over the player.
12
1.2
Poisson Process, Reliability Theory and Inventory Systems
• Everyday we encounter with a lot of random events. The waiting time for a
bus, the service time at a counter, the lifetime of a machine, the processing
time for a product etc.
The waiting time for the occurrence of an event can be modeled by a
continuous probability distribution.
• On the other hand, there are other events such as: the number of buses passes
through a junction in one hour, the number of customers served per day, the
number of broken machines handled in a year and the number of products produced per month.
The number of events occurs in a fixed period can be modeled by a discrete
probability distribution.
13
• Here we propose to employ the Exponential distribution (continuous case)
f (t) = λe−λt, t ≥ 0, λ > 0.
and the Poisson distribution (discrete case)
(λt)n −λt
f (n, t) =
e ; λ > 0,
n!
to model the above random events.
t ≥ 0,
n = 0, 1, . . . .
• These two distributions are linked up by the Poisson process.
We will also discuss the relationship in detail.
14
1.3
Exponential Distribution and Poisson Distribution
1.3.1
Exponential Distribution
Definition 1. A continuous random variable X is said to follow
an exponential distribution with parameter λ, if its probability
density function is given by
{ −λx
λe
if x ≥ 0
f (x) =
0
if x < 0.
Proposition 1. If a random variable X follows the exponential
distribution with parameter λ then
E(X) = λ−1
and
V ar(X) = λ−2
and its moment generating function is given by
λ(λ − t)−1.
15
Proof. First by definition, we have
∫
∞
E(X) =
xf (x)dx =
0
∫
and
∞
2
E(X ) =
x2f (x)dx =
0
1
λ
2
.
2
λ
Therefore we have
2
1
1
−
=
.
2
2
2
λ
λ
λ
Finally the moment generating function is then given by
∫ ∞
λ
M (t) = E(etX ) =
etxλe−λxdx =
.
λ−t
0
We note that M ′(0) = λ−1 = E(X) and M ′′(0) = 2λ−2 = E(X 2).
V ar(X) = E(X 2) − E(X)2 =
Remark 1. We note that
∫
∞
E(g(X)) =
−∞
g(x) · f (x)dx.
There is a one-to-one relation between the Probability Density Function
(PDF) and its Moment Generating Function (MGF).
16
Definition 2. A probability distribution (let say a non-negative random variable
X) is said to have the Markov property if for any two non-negative values t
and x we have
P {X > x + t|X > t} = P {X > x}.
Proposition 2. The exponential distribution has the Markov property.
Proof. This follows from
e−µ(t+x)
P {X > t + x|X > t} =
e−µt
−µx
= e
= P {X > x}.
(1.1)
• In a number of applications, observation has shown that the exponential distribution can be a good description of service time distribution (which is therefore
called exponential service time).
• Exponentially distributed time has the nice feature that by the Markov property
(1.1), the distribution of the remaining holding time after a customer has been
served for any length of time t > 0 is the same as that initially at t = 0.
17
Here are some more properties of the exponential distribution.
Proposition 3. If X1 (the waiting time for Bus 1) and X2 (the waiting time for Bus 2)
are independent non-negative random variables with probability density
functions f1(t) and f2(t). If they are both exponentially distributed random variables with means µ−1
i , then the probability that X2 exceeds X1 is
µ1/(µ1 + µ2).
Proof.
∫
P {X2 > X1} =
∞∫ ∞
∫
∞∫ t
f1(s)f2(t)dtds =
0
s
f1(s)f2(t)dsdt
0
0
i.e., we integrate the joint density function over the region
Ω = {(s, t) ∈ R2|t > s}.
−1
If X1 and X2 are exponential with mean µ−1
1 and µ2 , then the above integral
becomes
∫ ∞∫ ∞
∫ ∞
µ1
−µ1 s
−µ2 t
−µ1 s −µ2 s
.
µ1e
µ2e
dtds =
µ1e
e
ds =
(1.2)
µ
+
µ
1
2
0
s
0
18
Proposition 4. Suppose that X1, X2, . . . , Xn are independent,
identical, exponential random variables with mean µ−1, and
consider the corresponding order statistics
X(1) ≤ X(2) ≤ · · · ≤ X(n).
Then X(1) is again exponentially distributed with mean n1 times
the mean of the original random variables.
Proof. We observe that
X(1) = min(X1, X2, . . . , Xn).
• We note X(1) > x if and only if all Xi > x (i = 1, 2, . . . , n).
Hence we have
P {X(1) > x} =
=
=
P {X1 > x}P {X2 > x} · · · P {Xn > x}
(e−µx)n
e−nµx.
19
Proposition 5. A random variable X is exponentially distributed
if and only if
P {X < t + h|X > t} = µh + o(h) as h → 0.
Here o(h) is a function of h such that
o(h)
= 0.
lim
h→0 h
Proof. Suppose X has an exponential distribution,
P {X < t + h|X > t}
= 1 − e−µh
(Markov property)
(1.3)
(Use Taylor’s series, see Remark 2) = 1 − (1 − µh + o(h))
= µh + o(h) as h → 0.
• Conversely, suppose that
P {X < t + h|X > t} = µh + o(h) as h → 0
then we have
P {X > t + h|X > t} = 1 − µh + o(h).
20
Using
P {X > t + h}
P {X > t + h|X > t} =
,
P {X > t}
re-arranging the terms and let h → 0, one can obtain the differential
equation (see Remark 3)
d
P {X > t} = −µP {X > t},
dt
which has the unique solution
P {X > t} = e−µt
satisfying the initial condition
P {X > 0} = 1,
i.e. X follows the Exponential distribution.
21
Remark 2. For 0 < h < 1 we have
2
3
µh
(µh)
(µh)
1 − e−µh = 1 − (1 −
+
−
+ . . .)
1!
2!
3!
2
3
µh (µh)
(µh)
=
−
+
− ...
1!
2!
3!
≡ µh + g(h).
Then we see that
(
)
2h
3 h2
g(h) (µ)
(µ)
= −
+
−
.
.
.
h 2!
3!
(
)
(µ)2 (µ)3
≤ h
+
+ ...
2!
3!
= heµ.
Therefore
g(h)
lim
= 0 and hence g(h) = o(h).
h→0 h
22
Remark 3. Given
P {X < t + h|X > t} = µh + o(h) as h → 0
then we have
P {X > t + h|X > t} = 1 − µh + o(h).
Using
P {X > t + h}
P {X > t + h|X > t} =
P {X > t}
we have
P (X > t + h) = P (X > t) − µhP (X > t) + o(h)P (X > t).
23
Let F (t) = P (X > t) then we have
F (t + h) = F (t) − µhF (t) + o(h)F (t)
= F (t) − µhF (t) + o(h).
• Thus
F (t + h) − F (t)
o(h) =
.
−
(−µF
(t))
h h
Hence
F (t + h) − F (t)
o(h) = 0.
lim − (−µF (t)) = lim h
h→0
h→0 h • This means
F ′(t) = −µF (t).
And F (x) can be solved easily with F (0) = 1.
• Solving the differential equation, we have
F (t) = e−µt.
24
1.3.2
Poisson Distribution
The Poisson distribution is a discrete distribution given by
(λt)j −λt
Pj (t) =
e
j = 0, 1, 2, . . . .
j!
Proposition 6. If a random variable X follows the Poisson distribution then
E(X) = λt
and
V ar(X) = λt
and its moment generating function is
z)
−λt(1−e
e
.
25
Proof. We have
E(X) =
=
∞
∑
(
(λt)j −λt
e
j!
j
j=0
∞
∑
j=1
• Since
E(X(X − 1)) =
=
)
(λt)j−1 −λt
(λt)
e
= λt.
(j − 1)!
∞
∑
(
j(j − 1)
j=1
∞
∑
j=2
(λt)j −λt
e
j!
j−2
(λt)
(λt)2
e−λt
(j − 2)!
= (λt)2.
We have
E(X 2) − E(X) = (λt)2.
26
)
• Hence we have
V ar(X) = E(X 2) − E(X)2
= (λt)2 + E(X) − E(X)2
= λt.
• Finally the moment generating function is given by
)
(
∞
j
∑
(λt)
E(eXz ) =
e−λt
ejz
j!
j=0 (
)
∞
∑ (λtez )j
=
e−λt
j!
j=0
z
−λt+λte
= e
z)
−λt(1−e
= e
.
27
Proposition 7. Sum of two independent Poisson variables is
again a Poisson variable.
Proof. Let
X = X1 + X2
where Xi is a Poisson random variable with mean λit. Then X has
the moment generating function
z ) −λ t(1−ez )
z)
−λ
t(1−e
−(λ
+λ
)t(1−e
1
2
1
2
g(z) = e
e
=e
.
This shows that the sum of two independent Poisson random variable
with means λ1t and λ2t is itself a Poisson random variable with mean
(λ1 + λ2)t.
Remark 4. Let gi(z) be the moment generating functions
of Xi. If X1 and X2 are independent, then the moment generating
function of X1 + X2 is given by
E(et(X1+X2)) = E(etX1 · etX2 ) = E(etX1 ) · E(etX2 ) = g1(z) · g2(z).
28
1.4
Poisson Process
Definition 3. A process P (t) is said to be a Poisson process if at
any epoch t,
P {one occurrence during (t, t + h)} = λh + o(h) as h → 0
and
P {two or more occurrence during (t, t + h)} = o(h) as h → 0.
Remark 5. The following are some remarks on the notation o(t).
(a) We recall that o(h) is a function of h such that
o(h)
lim
= 0.
h→0 h
(b) Possible examples of o(h) are o(h) = h2 and o(h) = h sin(h).
√
(c) However, o(h) cannot take the form h or h log(h).
29
Poisson process, Poisson distribution and exponential distribution are related as follows:
Proposition 8. Suppose in a certain process, we let Ti (i =
1, 2, 3, · · · ) be the epoch of the ith occurrence.
Let Ai = Ti − Ti−1 (i = 1, 2, 3, · · · ); T0 = epoch that we start to
count the number of occurrences.
Let X(t) = number of occurrences in a time interval of length t.
Then the following statements are equivalent.
(a) The process is Poisson (with mean rate λ).
(b) X(t) is a Poisson random variable with parameter λt, i.e.
(λt)j −λt
P {X(t) = j} =
e , j = 0, 1, 2, · · · .
j!
(c) Ai’s are mutually independent identically distributed exponential random variables with mean λ−1, i.e.
P {Ai ≤ t} = 1 − e−λt , i = 1, 2, · · · .
30
(a) implies (b)
• Given a Poisson process, with mean rate λ, we want to find the
number of occurrences in the time interval [0, t] and the probability
density of the time between two successive occurrences.
• Divide the time interval into n equal parts and each is of length t/n.
In each of the sub-interval, the probability that it contains an occurrence is given roughly by λh = λt/n. Therefore the probability that
one get k occurrences is given by the Binomial distribution
( ) ( )k (
)n−k
(
)n−k
k
λt
λt
n · · · (n − k + 1) (λt)
λt
n
·
.
1−
=
· 1−
k
k
n
n
n
n
|
{z
} k! |
{z
}
→1
→e−λt
• By letting n goes to infinity, from the well-known result, we have
e−λt(λt)k
the Poisson distribution:
.
k!
31
(b) implies (a)
We note that
P (k = 0) = e−λt
= 1 − λt + o(t) (by Remark 2)
and
P (k = 1) = e−λtλt
(λt)2
= λt(1 − λt + 2! + · · · +)
(λt)3
2
= λt + (−(λt) + 2! + · · · +)
= λt + o(t).
Therefore we have
P (k > 1) = 1 − P (k = 0) − P (k = 1)
= o(t).
32
(a) and (b) implies (c)
Let f (t) be the probability density of the inter-occurrence time T .
Then by definition we have
P (t ≤ T ≤ t + δt) = f (t)δt
= P (no occurrence in (0, t)) × P (1 occurrence in (t, t + δt)).
Now we have
P (no occurrence in (0, t)) = e−λt
and
P (one occurrence in (t, t + δt)) = λδt.
Hence we have
and
f (t)δt = λe−λtδt
f (t) = λe−λt.
33
(c) implies (b)
We will prove by using mathematical induction that the PDF of the random
variable A(n) = A1 + A2 + . . . + An is given by the Erlangian distribution:
λ(λt)n−1e−λt
gn(t) =
.
(n − 1)!
For n = 1, it is clear the proposition is true and we assume the statement holds for
some positive integer n. We then look for the PDF of
Z = An + A.
∫ t∫
We have
P (Z ≤ t) =
∫0 t
t−x
gn(x)λe−sλdsdx
0
gn(x)(1 − e−λ(t−x))dx
∫0 t
∫ t
λ(λx)n−1
−λt
dx
=
gn(x)dx − e
(n
−
1)!
0
∫0 t
n
−λt (λt)
=
.
gn(x)dx − e
n!
0
=
34
• Taking the derivative of the R.H.S. with respect to t, we obtain the PDF of Z as
follows:
λ(λt)ne−λt
gn+1(t) =
.
n!
We have the following holds for n = 1, 2, . . . ,
∫ t
λ(λx)n−1e−λx
P (A(n) ≤ t) =
dx
(n − 1)!
0
n−1
k
∑
−λt (λt)
= 1−
e
(apply integration by parts)
k!
k=0
n−1
∑
= 1−
P (X(t) = k).
k=0
• Here P (A(n) ≤ t) is the probability that the waiting time for the nth arrivals is
less than or equal to t.
• This is also the probability that there are n or more arrivals observed in the
interval [0, t]. Thus X(t) follows the Poisson distribution with mean rate λt.
35
Remark 6. Poisson process provides rather good approximation for modeling
many random processes such as the arrival of customers and calls. From the proposition above, a process is Poisson (with coefficient λ) if and only if the inter-arrival
times (the lengths of time between successive customer arrivals) are mutually independent exponentially distributed random variables with mean λ−1.
• We note that if the mean arrival rate is λ then the mean inter-arrival time is
λ−1. By the Markov property of exponential random variables, the distribution
of lengths of time from an arbitrarily chosen epoch to the next arrival (called the
next-arrival times) is the same as the distribution of inter-arrival times.
Observer’s Sampling Point
↓
↓
↓
B
-
A
Distribution of A = Distribution of B
Figure 1.1. Arrival of Customers.
• This nice property much simplifies the mathematical analysis of many situations.
36
• For a Poisson process,
P {exactly one occurrence in(t, t + h)} = λhe−λh.
Now for a fixed t and any x in (0, t),
P {the epoch of occurrence is in (0, x)| exactly one occurrence in (0, t)}
P {exactly one occurrence in (0, x), and no occurrence in (x, t)}
=
P {exactly one occurrence in (0, t)}
λxe−λx × e−λ(t−x)
=
−λt
λte
x
= ,
t
which is a uniform distribution.
• This means that if we know there is exactly one occurrence in (0, t)
then the epoch of that occurrence is equally likely throughout (0, t).
In this sense we say that a Poisson process is random.
37
1.5
Law of Large Numbers and Central Limit Theorem
In this section, we first introduce without prove the famous law of large numbers which states that the long-run average of a sequence of independent and
identically distributed random variables will converge to its mean. We then prove
the weak law of large numbers. We state without proving the strong law
of large numbers. Finally, we will state and prove another important theorem
namely the central limit theorem.
• Instead we shall state and give a proof for the weak law of large numbers.
• At the same time we also introduce two important inequalities for random variables: Markov inequality and Chebyshev inequality.
• By making use of the Chebyshev inequality, one can prove the weak law of
large numbers.
• Let us begin our proof with the Markov inequality.
38
Proposition 9. If a random variable takes X only non-negative values,
then for any a > 0, we have
E(X)
.
a
This is known as the Markov inequality.
Proof. We give a proof for the case when X is a continuous random variable. The
case for discrete random variable is similar and therefore omitted. Let f (x) be the
probability density function.
∫ ∞
E(X) =
xf (x)dx
∫0 a
∫ ∞
=
xf (x)dx +
xf (x)dx
a
∫0 ∞
≥
xf (x)dx
∫a ∞
≥
af (x)dx (because xf (x) ≥ af (x), for x ≥ a)
a∫ ∞
= a
f (x)dx = aP (X ≥ a).
P (X ≥ a) ≤
a
39
• Using the Markov inequality, one can also prove the Chebyshev’s inequality.
Proposition 10. If a random variable X has mean µ and variance σ 2 then
for any k > 0, we have
1
P (|X − µ| ≥ kσ) ≤ 2 .
k
This is known as the Chebyshev’s inequality.
Proof. Let (X − µ)2/σ 2 be a non-negative random variable whose mean is
]
[
2
(X − µ)
E[(X − µ)2] σ 2
=
E
= 2 = 1.
σ2
σ2
σ
Let a = k 2 and apply the Markov inequality we have
(
)
1
(X − µ)2
2
P
≥ k ≤ 2.
σ2
k
Hence we get
1
P (|X − µ| ≥ kσ) ≤ 2 .
k
40
Proposition 11. Let X1, X2, . . . , Xn be a sequence of independent and identical distributed random variables having mean µ and finite variance σ 2. Then
for any e > 0 we have
(
)
X1 + X2 + . . . + Xn
lim P − µ ≥ e = 0.
n→∞
n
This is the famous Weak Law of Large Numbers.
Proof. Let
1
X̄ = (X1 + X2 + . . . + Xn)
n
we have
E(X1) + E(X2) + . . . + E(Xn) µ + µ + . . . + µ
E(X̄) =
=
=µ
n
n
and
Var(X1) + Var(X2) + . . . + Var(Xn) σ 2 + σ 2 + . . . + σ 2 σ 2
Var(X̄) =
=
= .
2
2
n
n
n
By Chebyshev’s inequality for any positive k we have
kσ
1
P (|X̄ − µ| ≥ √ ) ≤ 2 .
k
n
41
√
In particular we let k = e n/σ and get
σ2
P (|X̄ − µ| ≥ e) ≤ 2 .
ne
Therefore for any positive e we have
lim P (|X̄ − µ| ≥ e) = 0.
n→∞
Here let us also state without prove the strong law of large numbers. A generalization of the weak law of large numbers is that, with probability one,
lim X̄ = µ.
n→∞
This means that the long-run average of a sequence of independent and identically
distributed random variables will converge to its mean.
Proposition 12. Let X1, X2, . . . , Xn be a sequence of independent and identical distributed random variables having mean µ and finite variance σ 2. Then
we have
(
)
X1 + X2 + . . . + Xn
P lim
= µ = 1.
n→∞
n
This is the famous Strong Law of Large Numbers.
42
Proposition 13. (Central Limit Theorem)Let X1, X2, . . . , Xn be a
sequence of independent, identically distributed random variables
with mean µ and variance σ 2. Then the following random variable tends to the normal distribution with mean 0 and variance 1
as n → ∞:
X1 + X2 + . . . + Xn − nµ X̄n − µ
√
√ .
=
Zn =
σ n
σ/ n
Proof. Here we give a heuristic proof to show that the moment generating function of Zn tends to the moment generating function of the
standard normal N (0, 1) random variable, i.e.,
2/2
t
lim gZn (t) = e .
n→∞
43
• Now it is easy to check that E(Zn) = 0 and E(Zn2 ) = 1. We have
√
X +X +...+X −nµ
t( 1 2 σ√n n
)
gZn (t) = E(etZn ) = E(et(X̄n−µ)/(σ/ n)) = E(e
t(Xi−µ)
i−µ)
√
√
∏n t(X
∏
n
t n
σ
n
σ
n )= g
= E( i=1 e
) = i=1 E(e
X−µ ( √n )
)
σ
where g X−µ (.) is the moment generating function of the random variσ
X−µ
able X−µ
,
the
normalized
form
of
X.
As
mean
and
variance
of
σ
σ
are 0 and 1, respectively.
• We write
1 2
g X−µ (t) = 1 + 0t + t + m3t3 + m4t4 + · · · +
2
σ
and therefore
(
)
t
t2
m3 t 3
log(gZn (t)) = n log(g X−µ ( √ )) = n log 1 +
+
(√ ) + · · · + .
2!n
3!
n
n
σ
44
• Thus we have
log(gZn (t)) ≡ n log(1 + a(t)).
Since we have the Taylor’s series
b2 b3
log(1 + b) = b − + − · · · |b| < 1.
2
3
Now by choosing t close to zero, we have |a(t)| < 1 and we have
(
)
(
)
1 2
1 3
log gZn (t) = n log(1+a(t)) = n (a(t) − a (t) + a (t) − · · · .
2
3
• By inspection, we have
t2
lim na(t) =
and
lim nak (t) = 0 for k = 2, 3, . . . .
n→∞
n→∞
2
Thus
t2
t2
lim log(gZn (t)) = . or
lim gZn (t) = e 2 .
n→∞
n→∞
2
45
1.6
Introduction to Inventory Models
• To meet demand on time and compete in the market, a company needs to keep
suitable amount of stock in hand and order replenishment at a right time. Too
much stock incurs extra inventory cost but shortage in stock will result in unsatisfied demands.
• The purpose of inventory theory is to determine rules or policies so as to minimize
the system running cost (or maximize its profit) and meet the customer demand.
• To obtain “good” operational policy, one has to understand the process of demand, flow of material and the cost of the facility etc. Mathematical
models, especially probabilistic models are good tools for the analysis of inventory
process.
• The main objective here is to obtain optimal ordering policies for inventory systems such that the overall system running cost is minimized. We will begin with
simple deterministic models and then complex probabilistic models.
46
The following are possible costs associated with an inventory system.
1. Ordering and setup cost: it includes the cost of paperwork and billing
associated with an order. If the product is produced internally, this cost may
also include the cost for setting up a machine in a production system and also
the cost for labor.
2. Purchasing cost: it includes the cost of raw material and transportation
cost.
3. Holding cost: this is the cost of holding an unit of inventory for one period.
If the period is one year than it is the annual holding cost. This cost may also
include the insurance cost due to the possibility of spoilage.
4. Shortage cost: when a demand cannot be met in time, a shortage is said to
be occurred. There are two possible cases, either
(i) the customers accept delivery on a later date, this is called a backlogged
demand or
(ii) the customers refuse to have the delivery on a later date and this is called
a lost sale.
47
1.6.1
Deterministic (EOQ) Inventory Models
The followings are the basic assumptions of the deterministic inventory Models.
1. Repetitive ordering:
This means that the ordering decision is repetitive in the sense that it is repeated
in a regular manner.
2. Constant demand:
Demand is assumed at a constant rate, let say D per year. Then the total
demand for t years will be Dt.
3. Constant lead time:
The lead time for each order is a known constant L.
Definition 4. The lead time is the length of time between the instance when
an order is placed and the instance at which the order arrives.
48
1.6.2
The Basic EOQ Model
The basic assumptions of a Economic Order Quantity (EOQ) Model are as
follow.
1. The demand D per year (we assume the unit of time is one year) is a known
constant,
2. The ordering cost is K when an order is placed,
3. The size of an order q is a constant to be determined,
4. The lead time L is assumed to be 0,
5. No shortage of demand is allowed,
6. The holding cost for one unit of inventory is h per year,
7. The unit purchasing cost is p.
Remark 7. We note that the orders arrive instantaneously, in an optimal policy
one should never place an order when the inventory level is greater than zero or
otherwise an unnecessary holding cost will be incurred. When the inventory level is
zero, one must place an order to prevent a shortage from occurring. The inventory
level of the system is given in Figure 1.5.
49
6
Inventory Level
•@
q−
@
@
@
@
@
@
@
@
@
@
@
|
q
D
•@
@
@
@
@
@
@
@
@
··· ··· ···
|
2q
D
@
@
@
@
@
@
@
|
1
@
@
-
t
Figure 1.5. The Basic EOQ Model.
Definition 5. Any interval of time that begins with the arrival of an order and
ends with the instant before the next order is received is called a cycle.
Remark 8. We note the cycle of this model is q/D.
• We are going to determine the optimal value of q such that the total annual
running cost of the system is minimized.
• The annual running cost c(q) can be written as follows:
cost of placing order + purchasing cost + holding cost.
50
(a) Since the size of each order is q units and the annual demand is D, one needs
D/q orders per year. Thus the annual cost of placing order is given by
D
K( ).
q
(b) The annual purchasing cost is of course given by
pD.
(c) In each cycle, inventory level decreases from q down to 0 at a linear rate of D.
Thus the average inventory level is given by
· q/D · q q
= .
q/D
2
The annual holding cost is given by
q
h( ).
2
• From (a)-(c), the annual cost is given by
1
2
KD
hq
c(q) =
+ pD + .
q
2
51
(1.4)
• We note that
−KD h
+ .
2
q
2
The only critical point is obtained by solving c′(q) = 0 and We have
√
2KD
q∗ =
.
h
Since
2KD
c′′(q) = 3 > 0 for q > 0
q
the annual cost c(q) is minimized when q = q ∗. The optimal size of an order
√
2KD
∗
q =
h
c′(q) =
is called the Economic Order Quantity (EOQ).
• In this case the optimal annual cost is given by
√
√
h 2KD √
h
c(K, D, h) = KD
+ pD +
= 2hKD + pD.
2KD
2
h
52
(1.5)
Remark 9. We give some remarks on the EOQ Model.
1. The EOQ does NOT depend on the unit purchasing price, because no matter
what is the size of each order, one has to purchase D units per year.
2. From the formula, we note that EOQ increases when the annual demand D or
the ordering cost K increases.
3. If the unit holding cost increases, then the EOQ decreases and the system holds
less inventory.
4. We also note is that when EOQ is applied, the annual holding cost is given by
√
√
hKD
hq h 2KD
=
=
2
2
h
2
and the annual ordering cost is given by
√
√
KD
h
hKD
= KD
=
.
q
2KD
2
Therefore the annual holding cost is equal to the annual ordering cost.
53
Example 1. A company uses 500 units of drug per year. The cost for the ordering
of drug is 20, each unit of drug cost 100 and the holding cost for one unit of drug
is 2 per year.
(a) What is the EOQ and the optimal running cost?
(b) How many order will be placed each year?
(c) What is the cycle length?
We note that in this case K = 20, h = 2 and D = 500.
(a) The EOQ will be
√
√
2KD
2(20)500
=
= 100.
h
2
The optimal running cost is
√
√
2hKD + pD = 2(2)(20)(500) + 100(500) = 50200.
(b) The number of orders will be 500/100 = 5.
(c) The cycle length will be 1/5 year.
54
Example 2. In the previous example, if the drug is rotten and cannot be used if it
spends more than one month in the inventory. What is the EOQ, optimal running
cost and the cycle length in this case?
• In previous example the cycle length is 2.4 month which is longer than one month.
Thus the new cycle length is one month (1/12 year).
• The new EOQ is
D/12 = 500/12
and the optimal running cost is
hq
KD
+ pD +
q
2
2(500/12)
20(500)
+ (100)500 +
=
500/12
2
= 50281.7.
55
1.6.3
When EOQ is Applicable
• One main assumption on the deterministic EOQ model is the constant demand.
• To determine if the assumption of constant demand is reasonable, suppose that
the demand during the n periods of time D1, D2, · · · , Dn are observed. Let the
mean of the observed demands be
)
( n
∑
1
D̄ =
Di
n i=1
and the variance of the observed demands
( nbe )
1 ∑ 2
Di − D̄2.
Var(D) =
n i=1
• We define coefficient of variation as
√
Var(D)
.
CV =
D̄
Clearly if CV = 0, then the demand is constant. In general if
CV < 0.2
then the assumption of constant demand is reasonable.
56
1.6.4
Newsboy Model with Discrete Stochastic Demand
In the previous section we studied some inventory models where the demand is a
known constant. In this section we will discuss inventory models with stochastic
(uncertain) demand. In our discussion, stochastic (uncertain) means that the demand is no longer constant but follows certain known probability function. We
begin our discussion with the newsboy model.
• A newsboy sells newspaper (perishable) every morning. The cost of each newspaper remains at the end of the day is Co (overage cost) and the cost of each
unsatisfied demand is Cu (underage cost or shortage cost).
• Suppose that the probability density function of the demand D is given by
Prob(D = i) = pi ≥ 0,
i = 0, 1, 2, · · · .
• Our aim is to determine the best amount Q of newspaper to be ordered such that
the expected cost is minimized.
57
• Consider the following newsboy problem. If the daily demand is given as follows:
x
0
1 2 3
4
P (x) 0.05 0.2 0.3 0.4 0.05
Table 1.2.
• Suppose the unit underage cost and the overage cost are the same, what is the
optimal ordering quantity Q∗ ?
• It is clear that the optimal order quantity Q∗ should be in {0, 1, 2, 3, 4}.
Most likely Q∗ ̸= 0 and Q∗ ̸= 4.
It is possible that Q∗ = 1, 2, 3.
Most likely Q∗ should be 3 as P (x = 3) = 0.4, it has the highest probability.
Do you agree? You will find out the answer yourself later.
58
Proposition 14. Define the cumulative probability function of the demand D
to be
Q
∑
F (Q) =
pi = Prob(D ≤ Q)
i=0
Then the optimal value of Q satisfies
Cu
≤ F (Q).
F (Q − 1) <
Cu + Co
Step 1: We first construct the expected cost for a given order size Q. Clearly we
have the following TWO cases.
(i) If the demand D < Q, then the cost will be (Q − D)Co and
(ii) if the demand D > Q, then the cost will be (D − Q)Cu.
Therefore the expected cost is given by
Q
∞
∑
∑
(i − Q)pi .
E(Q) = Co
(Q − i)pi + Cu
|
i=0
{z
}
Expected Overage Cost
59
|
i=Q+1
{z
}
Expected Shortage Cost
We note that F (Q) is an increasing function in Q, i.e.
F (Q + 1) ≥ F (Q) for Q = 0, 1, 2, · · ·
and F (Q) → 1 as Q → ∞.
We are going to show that the optimal Q is the one which satisfies
Cu
F (Q − 1) <
≤ F (Q).
Cu + Co
Step 2: From the definition of the expected cost E(Q), we are going to show
E(Q) − E(Q + 1) = Cu − (Co + Cu)F (Q)
(1.6)
and
E(Q) − E(Q − 1) = −Cu + (Co + Cu)F (Q − 1).
To prove (1.6), we recall that the expected cost
E(Q) = Co
Q
∑
(Q − i)pi + Cu
i=0
∞
∑
i=Q+1
60
(i − Q)pi.
(1.7)
Now we have
E(Q + 1) = Co
Q+1
∑
= [Co
= [Co
∞
∑
(Q + 1 − i)pi + Cu
i=0
Q+1
∑
(i − Q − 1)pi
i=Q+2
(Q − i)pi + Co
i=0
Q
∑
Q+1
∑
pi] + [Cu
i=0
(Q − i)pi − CopQ+1 + Co
i=0
∞
∑
+[Cu
= Co
i=0
∞
∑
pi]
i=Q+2
pi]
i=0
(i − Q)pi − CupQ+1 − Cu
(Q − i)pi + Cu
(i − Q)pi − Cu
i=Q+2
Q+1
∑
i=Q+1
Q
∑
∞
∑
∞
∑
pi]
i=Q+2
∞
∑
i=Q+1
(i − Q)pi + Co
= E(Q) + CoF (Q) − Cu(1 − F (Q))
= E(Q) − Cu + (Co + Cu)F (Q).
61
Q
∑
i=0
pi − Cu
∞
∑
i=Q+1
pi
Therefore we have
E(Q) − E(Q + 1) = Cu − (Co + Cu)F (Q)
and
E(Q + 1) − E(Q) = −Cu + (Co + Cu)F (Q).
Hence we get (1.7)
E(Q) − E(Q − 1) = −Cu + (Co + Cu)F (Q − 1).
We note that
E(Q) ≤ E(Q + 1) ⇔ E(Q) − E(Q + 1) ≤ 0
⇔ Cu − (Co + Cu)F (Q) ≤ 0
u
⇔ CoC+C
≤ F (Q)
u
(1.8)
E(Q) < E(Q − 1) ⇔ E(Q) − E(Q − 1) < 0
⇔ −Cu + (Co + Cu)F (Q − 1) < 0
u
⇔ CoC+C
> F (Q − 1).
u
(1.9)
and
62
Step 3: We are going to analyze the expected cost.
Since F (Q) is increasing in Q, there exists Q∗ such that
Cu
≤ F (Q) for Q ≥ Q∗.
Co + Cu
(1.10)
We assume Q∗ to be the least integer such that (1.10) is satisfied. This means that
Cu
> F (Q∗ − 1).
Co + Cu
Now we have
Cu
≤ F (Q∗) ≤ F (Q∗ + 1) ≤ F (Q∗ + 2) ≤ · · · ≤ .
Co + Cu
From (1.8) we have
E(Q∗) ≤ E(Q∗ + 1) ≤ E(Q∗ + 2) ≤ · · · ≤ .
This implies that the optimal Q should be less than or equal to Q∗.
63
We also note that
Cu
> F (Q∗ − 1) ≥ F (Q∗ − 2) ≥ · · · ≥ F (0).
Co + Cu
From (1.9) we have
E(Q∗) < E(Q∗ − 1) < E(Q∗ − 2) < · · · < E(0).
This means that the optimal Q should be greater than or equal to Q∗.
We conclude that the optimal value of Q satisfies
Cu
F (Q − 1) <
≤ F (Q).
Cu + Co
Remark 10. In particular if
Cu
≤ F (0)
Co + Cu
then the optimal ordering quantity Q will be 0.
64
Example 3. Suppose the daily demand follows a Geometric distribution:
pi = (1 − p)pi, i = 0, 1, 2, · · ·
where 0 < p < 1
and the overage cost and the underage cost are equal. What is the optimal ordering
quantity?
We have
F (Q) =
Q
∑
pi =
i=0
Q
∑
(1 − p)pi = 1 − pQ+1.
i=0
Thus the optimal Q should satisfy
1
F (Q − 1) = 1 − p < ≤ 1 − pQ+1 = F (Q)
2
Q
or equivalently
pQ+1 ≤
Therefore
1
< pQ .
2
log(0.5)
Q = integral part of
.
log(p)
∗
65
1.6.5
Newsboy Model with Continuous Stochastic Demand
What will be the case when the demand follows a continuous PDF? We expect to
get a similar result as in the previous section.
We replace the discrete probability function pi by a continuous probability density
function f (x). Here for simplicity we assume
f (x) > 0 for x ≥ 0 and f (x) = 0 for x < 0.
Proposition 15. Define the cumulative probability function of the demand D
to be
∫ Q
F (Q) =
f (t)dt
0
which is strictly increasing in Q. Then the optimal value of Q satisfies
Cu
F (Q) =
.
Cu + Co
66
Step 1: The expected cost when the ordering quantity is Q:
∫ Q
∫ ∞
E(Q) = Co
(Q − x)f (x)dx + Cu
(x − Q)f (x)dx .
| 0
{z
} | Q
{z
}
Expected Overage Cost
Expected Shortage Cost
We are going to show that E(Q) is minimized when Q is chosen to be the root of
the equation
Cu
F (Q) =
.
Co + Cu
Step 2: To find the critical point of the expected cost.
Rewrite
∫ Q
∫ Q
E(Q) = CoQ
f (x)dx − Co
xf (x)dx
0 ∫ ∞
∫0 ∞
+Cu
xf (x)dx − CuQ
f (x)dx.
Q
Q
67
We note that
} ∫ Q
{ ∫ Q
d
f (x)dx =
f (x)dx + Qf (Q).
Q
dQ
0
0
{∫ Q
}
d
xf (x)dx = Qf (Q).
dQ
0
{∫ ∞
}
d
xf (x)dx = −Qf (Q).
dQ
{ ∫ ∞ Q
} ∫ ∞
d
Q
f (x)dx =
f (x)dx − Qf (Q).
dQ
Q
Q
and we have
E ′(Q) = Co
∫
Q
f (x)dx + CoQf (Q) − CoQf (Q)
0
∫ ∞
−CuQf (Q) − Cu
f (x)dx + CuQf (Q).
Q
Finally we get
E ′(Q) = Co
∫
∫
Q
f (x)dx − Cu
0
= (Co + Cu)F (Q) − Cu.
68
∞
f (x)dx
Q
The critical point satisfies the equation
Cu
F (Q) =
.
Co + Cu
Remark 11. Since F (Q) is strictly increasing, and F (0) = 0 and F (Q) → 1 as
Q → ∞, the following equation has an unique root
Cu
.
F (Q) =
Co + Cu
Step 3: Analyze the critical point. Since
E ′′(Q) = (Co + Cu)f (Q) > 0 for Q > 0,
the optimal value of Q satisfies the equation
(Co + Cu)F (Q) − Cu = 0
or
F (Q) =
Cu
.
Co + Cu
69
Example 4. If the demand follows the exponential distribution
f (x) = λe−λx
then the optimal Q will satisfy
∫ Q
F (Q) =
λe−λxdx
0
= 1 − e−λQ
Cu
.
=
Co + Cu
Thus the optimal Q is
−1
Co
.
log
λ
Co + Cu
70
Remark 12. We give some remarks on the newsboy model.
1. It is clear that the optimal ordering quantity increases (decreases)
as the underage cost Cu increases (overage cost Co increases).
2. In the discrete case, it is difficult to determine the optimal Q when
the demand is a Poisson process with probability distribution
λxe−λ
, x = 0, 1, 2, · · · .
x!
• But when λ is large, the Poisson distribution can be approximated by the normal distribution with mean and variance λ.
• So by using the result in continuous case, one can get an approximated optimal ordering quantity very quickly when λ is large.
71
1.7
A Summary of Learning Outcomes
• Able to give the definitions of a Poisson Process and the Markov
property.
• Able to state and prove the relationships among a Poisson process, the exponential distribution and the Poisson distribution.
• Able to state and prove the relationships among the Markov
property, the exponential distribution and the Erlangian distribution.
• Able to formulate Newsboy’s problem and solve for its optimal
policy.
72
1.8
Exercises
1. Consider the geometric distribution
P (x = i) = p0(1 − p0)i,
0 < p0 < 1 i = 0, 1, . . . , .
Prove that for any two non-negative integers t and h, we have
P (x ≥ t + h|x ≥ t) = P (x ≥ h).
2. Consider an non-negative discrete random variable x taking values in {0, 1, 2, . . .}
such that P (x = i) = pi, i = 0, 1, 2, . . . and 0 < p0 < 1. Suppose that for any
two non-negative integers t and h, we have
P (x ≥ t + h|x ≥ t) = P (x ≥ h)
prove that
pi = p0(1 − p0)i,
i = 0, 1, . . . , .
3. Let p(h) = e−λh − 1 + λh, prove that
p(h)
=0
h→0 h
lim
and hence deduce that e−λh = 1 − λh + o(h).
73
4. Events occur according to a Poisson process with rate λ = 2 per hour.
(a) What is the probability that no event occurs between 8 p.m. and 9 p.m.?
(b) Starting at noon, what is the expected time at which the fourth event occurs?
(c) Find the probability that two or more events occur between 6 p.m. and 8
p.m..
5. Customers arrive at a bank at a Poisson rate λ. Suppose two customers arrived
during the first hour. What is the probability that
(a) both arrived during the first 20 minutes?
(b) at least one arrived during the first 20 minutes?
74
1.9
Suggested Solutions
1. We note that
P (x ≥ h + t) =
∞
∑
p0(1 − p0)i = (1 − p0)h+t,
i=t+h
P (x ≥ t) =
∞
∑
p0(1 − p0)i = (1 − p0)t,
i=t
and
P (x ≥ h) =
∞
∑
p0(1 − p0)i = (1 − p0)h.
i=t
Hence
P (x ≥ t + h) (1 − p0)h+t
h
=
=
(1
−
p
)
= P (x ≥ h).
0
t
P (x ≥ t)
(1 − p0)
The result follows.
75
2. Now from the given condition we have
∞
∑
pi
i=t+h
∞
∑
=
pi
∞
∑
pi.
i=h
i=t
Put t = 1 we have
∞
∑
pi = (1 − p0)
i=h+1
∞
∑
pi
i=h
and therefore
ph = p0
∞
∑
pi.
i=h
Now put h = 1 we have
p1 = p0(1 − p0).
Put h = 2 we have
p2 = p0(1 − p0 − p1) = p0(1 − p0 − p0(1 − p0)) = p0(1 − p0)2.
Inductively (or by using Mathematical Induction), the result follows.
76
3. By using L’hospital rule we have
e−λh − 1 + λh
−λe−λh + λ
lim
= lim
= 0.
h→0
h→0
h
1
Therefore by definition we have p(h) = o(h). Hence
o(h) = e−λh − 1 + λh
and the result follows.
4. Let X(t) be the number of occurrences in t hours.
(a) P (X(1) = 0) = e−2
(b) Let Tn denote the elapsed time between the (n − 1)th and nth event.
1
E(T1 + T2 + T3 + T4) = 4( ) = 2
λ
Therefore, the expected time at which the fourth event occurs is 2 P.M.
(c) P(Two or more events occur between 6 P.M. and 8 P.M.)
= 1 − P (X(2) = 0) − P (X(2) = 1)
= 1 − e−4 − 4e−4
= 1 − 5e−5
77
2
Discrete time Markov Chains : Theory and Examples
Andrei Markov2 (1856–1922) enrolled at the St. Petersburg
University in 1874. He earned a master degree in l880 and a doctorate
4 years later. He became a professor at St. Petersburg in 1886, and
a member of the Russian Academy of Sciences in l896.
• He retired in 1905, but continued to teach probability courses at
the university almost to his death. Apart from Mathematics, Markov
was also a very good chess player in St. Petersburg.
• His early work was devoted to
Number Theory, Continued Fractions,
Limits of Integrals, Approximation Theory
and Convergence of Series.
2
A Historical Note
78
• After 1900 he turned his attention to probability theory, and this
part of his work has the greatest effect on the development of science. He applied the method of continued fractions, pioneered by his
teacher Prof P. Chebyshev to probability theory.
• In 1887, Chebyshev outlined a proof of a generalized central
limit theorem. Markov then studied the sequences of mutually dependent variables, hoping to establish the limiting laws
of probability under fairly general assumptions. Eight years later,
Markov succeeded in proving the general result rigorously.
• While working on this problem, he also extended both the law of
large numbers, and the central limit theorem to certain sequences of dependent random variables forming special classes
of what are now called Markov chains. Markov chains appeared for
the first time in the paper,
79
(i) “The Extension of the Law of Large Numbers on Mutually Dependent Variables” in 1906, and
(ii) “Investigation of a Remarkable Case of Dependent Trials” in 1907.
• In other articles published in 1911-1912, he studied various generalizations of Markov chains. The foundation of the general theory
of Markov processes was laid down in the l930’s by A. Kolmogorov.
• Markov chains also have an extensive prehistory, including problems of random walks. But Markov himself never wrote about the
applications of his theory to sciences.
• He arrived at his chains starting from the internal needs of probability theory. For him the only real example of the chains were cards
shuffling and linguistic problems.
80
• Markov chains soon found many applications in modern physics.
One of the earliest applications was to describe the Brownian motion.
Later cosmic radiation and radioactivity were also studied.
• Another frequent application is to study of fluctuation in stock
prices. Phenomena generally referred to as random walks, has been
developed and widely applied in
biological, physical and social sciences, engineering ... .
• By his work Markov made important contributions to the development of probability theory, and launched the theory of stochastic
process. Markov is also remembered as a mathematician who enjoyed
doing numerical computations.
81
• He presented his attitude in an indirect way like this
. . . many mathematicians apparently believe that going beyond
the field of abstract reasoning into the sphere of
effective calculations would be
humiliating.
82
A Marketing Problem:3
In a town there are two supermarkets only. They are Wellcome and
Park’n.
A marketing research indicated that a consumer of Wellcome will
switch to Park’n in his/her next shopping with a probability α(> 0).
While a consumer of Park’n will switch to Wellcome in his/her next
shopping with a probability β(> 0).
Question 1: What is the probability that a Wellcome’s consumer
will still be a Wellcome’s consumer on his/her nth shopping?
Question 2: What will be the market share of the two supermarkets
in the town in the long-run?
3
Some Examples for Motivation
83
A Genetic Problem:
In a large population of individuals each of whom possess a particular
pair of genes, of which gene is classified as type ‘A’ or type ‘a’.
Assume that the proportions of individuals whose gene pairs are ‘AA’,
‘aa’, or ‘Aa’ (‘Aa′ = ‘aA′) are respectively given by p0, q0 and r0.
Here p0 + q0 + r0 = 1.
When two individuals mate, each contributed one of his/her genes,
chosen at random, to the resultant offspring. Assuming that the mating occurs at random also, in that each individual is equal likely to
mate with any other individual.
Question: What is the proportion of individuals in the next generation whose gene pair are AA, aa and Aa?
84
A Problem in the Web:
In surfing the Internet, surfers usually use search engines to find the
related webpages satisfying their queries.
Unfortunately, very often there can be thousands of webpages which
are relevant to the queries.
Question: How to obtain a proper list of the webpages in certain
order of importance?
Google actually developed a method called PageRank algorithm for
solving this problem.
85
2.1
Introduction to Markov Chain
We consider a stochastic process
{X (n), n = 0, 1, 2, . . .}
that takes on a finite or countable number of set.
Example 5. Let X (n) be the weather of the nth day which can be
M = {sunny, windy, rainy, cloudy}.
One may have the following realization:
X (0) =sunny, X (1) =windy, X (2) = rainy, X (3) =sunny, X (4) =cloudy, . . ..
Example 6. Let X (n) be the product sales on the nth day which can be
M = {0, 1, 2, . . . , }.
One may have the following realization:
X (0) = 4, X (1) = 5, X (2) = 2, X (3) = 0, X (4) = 5, . . ..
Remark 13. For simplicity of discussion we assume M , the state space be
{0, 1, 2, . . .}. The element in M is called a state of the process.
86
Definition 6. Suppose there is a fixed probability Pij such that
P (X (n+1) = j|X (n) = i, X (n−1) = in−1, . . . , X (0) = i0) = Pij n ≥ 0
where i, j, i0, i1, . . . , in−1 ∈ M . Then this is called a Markov chain process.
Remark 14. One can interprete the above probability as follows: the conditional
distribution of any future state X (n+1) given the past states X (0), X (2), . . . , X (n−1)
and present state X (n), is independent of the past states and depends on
the present state only.
Remark 15. The probability Pij represents the probability that the process will
make a transition to State j given that currently the process is State i. Clearly one
has
∞
∑
Pij ≥ 0,
Pij = 1 i = 0, 1, . . . .
j=0
Definition 7. The matrix containing Pij , the transition probabilities


P00 P01 · · ·
P =  P10 P11 · · · 
..
.. ..
is called the one-step transition probability matrix of the process.
87
Example 7. Consider the marketing problem again. Let X (n) be a 2-state process
(taking values of {0, 1}) describing the behavior of a consumer.
X (n) = 0 if the consumer shops with Wellcome on the nth day and;
X (n) = 1 if the consumer shops with Park’n on the nth day.
Since the future state (which supermarket to shop in the next time) depends on
the current state only, it is a Markov chain process.
It is easy to check that the transition probabilities
P00 = 1 − α,
P11 = 1 − β,
P01 = α,
P10 = β.
Therefore the one-step transition matrix of this process is given by
(
)
1−α α
P =
.
β 1−β
88
Example 8. (Random Walk) We consider a person who performs a random
walk on the real line with the counting numbers
M = {. . . , −2, −1, 0, 1, 2, . . .}
being the state-space, see Figure. 2.1.
Each time the person at State i can move one step forward (+1) or one step backward (-1) with probabilities p (0 < p < 1) and (1 − p) respectively.
Therefore we have the transition probabilities
Pi,i−1 = 1 − p
Pi,i+1 = p,
1−p
• |
|
|
0
1
2 ···
Figure 2.1. The Random Walk.
|
· · · −2
|
−1
p
i = 0, ±1, ±2, . . . .
89
Example 9. (Gambler’s Ruin) Consider a gambler who at each play of the
game, either wins one dollar with probability p or lose one dollar with probability
(1 − p).
The game is over if either he loses all his money or he attains a fortune of N dollars.
Let the gambler’s fortune be the state of the gambling process then the process is
a Markov chain. Moreover, we have transition probabilities
Pi,i−1 = 1 − p
Pi,i+1 = p,
i = 1, 2, . . . , N − 1
and P00 = PN N = 1. State 0 and N are called the absorbing state. The process
will stay at 0 or N for ever if one of the states is reached, see Figure 2.2..
1−p
|
0
|
1
p
•
|
|
| 2
3
···
N
Figure 2.2. The Gambler’s Ruin Model.
-
90
2.1.1
The nth-Step Transition Matrix
In the previous section we have defined the one-step transition probability matrix P
for a Markov chain process. In this section we are going to investigate the n-step
(n)
transition probability Pij of a Markov chain process.
(n)
Definition 8. We define Pij to be the probability that a process in State i will
be in State j after n additional transitions. In particular
(1)
Pij = Pij .
Proposition 16. We have
P (n) = P n
where P (n) is the n-step transition probability matrix and P is the one-step
transition matrix.
91
Proof. We will prove the proposition by mathematical induction.
Clearly the proposition is true when n = 1. We then assume that the proposition
is true for n. We note that P n = P
| ×P ×
{z. . . × P}.
n times
Then
(n+1)
Pij
=
∑
(n) (1)
Pik Pkj
=
k∈M
∑
Pikn Pkj = [P n+1]ij .
k∈M
By the principle of M.I. the proposition is true for all nonnegative integer n.
Remark 16. It is easy to see that
P (m)P (n) = P mP n = P m+n = P (m+n).
..
•N
@
.. @@ P
@
P
@
@
P
*•Hk
HHP
@
H
HH @@
..
HH @
HH@
P
P
HH
R
@
j
-• 0
i•
•j
(1)
Nj
(n)
iN
(n)
ik
(n)
i0
In n transitions
(1)
kj
(1)
0j
In one transition
Figure 2.3. The (n + 1)-step Transition Probability.
92
Example 10. We consider the marketing problem again. In the model, we have
(
)
1−α α
P =
.
β 1−β
If α = 0.3 and β = 0.4 then we have
(
)4 (
)
0.7 0.3
0.5749 0.4351
P (4) = P 4 =
=
.
0.4 0.6
0.5668 0.4332
Recall that a consumer is in State 0 (1) if he/she is a consumer of Wellcome (Park’n).
(4)
• P00 = 0.5749 which is the probability that a Wellcome’s consumer will shop with
(4)
Wellcome on his/her fourth shopping and P01 = 0.4351 is the probability that a
Wellcome’s consumer will shop with Park’n on his/her fourth shopping.
(4)
• P10 = 0.5668 is the probability that a consumer of Park’n will shop with Well(4)
come on his/her fourth shopping. P11 = 0.4332 is the probability that a consumer
of Park’n will shop with Park’n on his/her fourth shopping.
93
Remark 17. Consider a Markov chain process having states in {0, 1, 2, . . .}.
• Suppose that we are given at time n = 0 the probability that the
process is in State i is ai, i = 0, 1, 2, . . . .
• One interesting question is the following. What is the probability
that the process will be in State j after n transitions?
• The probability that given the process is in State i and it will be in
(n)
State j after n transitions is Pij = [P n]ij where Pij is the one-step
transition probability from State i to State j of the process.
• Therefore the required probability is
∞
∞
∑
∑
(n)
P (X (0) = i) × Pij =
ai × [P n]ij .
i=0
i=0
94
(n)
(n)
Let X(n) = (X̃0 , X̃1 , . . . , ) be the probability distribution of the states in a
(n)
Markov chain process at the nth transition. Here X̃i is the probability that the
∞
∑
(n)
process is in State i after n transitions and
X̃i = 1.
i=0
It is easy to check that X
(n+1)
(n)
= X P and X(n+1) = X(0)P (n+1).
Example 11. Refer to the previous example. If at n = 0 a consumer
belongs to Park’n. We may represent this information as
(0)
(0)
X(0) = (X̃0 , X̃1 ) = (0, 1).
What happen on his/her fourth shopping?
)4
(
0.7 0.3
(4)
(0)
(4)
= (0.5668, 0.4332).
X = X P = (0, 1)
0.4 0.6
This means that with a probability 0.4332 he/she is still a consumer
of Park’n and a probability 0.5668 he/she is a consumer of Wellcome
on his/her fourth shopping.
95
2.2
Irreducible Markov Chain and Classifications of States
Definition 9. In a Markov chain, State j is said to be accessible from State i if
(n)
Pij > 0 for some n ≥ 0. This means that starting from State i, it is possible to
enter State j in finite transitions and we write i → j.
Definition 10. State i and State j are said to communicate if State i and State
j are accessible from each other and we write i ↔ j.
Remark 18. We note that “ ↔” is an equivalent relation.
(i) State i communicates with State i in 0 step because
(0)
Pii = P (X (0) = i|X (0) = i) = 1 > 0.
(ii)If State i communicates with State j, then State j communicates with State i.
(iii)If State i communicates with State j and State j communicates with State k
(m)
(n)
then State i communicates with State k. Because Pij , Pjk > 0 for some m and
n. We have
∑ (m) (n)
(m+n)
(m) (n)
Pik
=
Pih Phk ≥ Pij Pjk > 0.
h∈M
Therefore State k is accessible from State i. By inter-changing the roles of i and k
State i is accessible from State k. Hence i communicates with k.
96
Definition 11. Two states that communicates are said to be in the same class.
A Markov chain is said to be irreducible, if all states communicates with each
other.
Example 12. Consider the transition probability matrix


0
0.0 0.5 0.5
1  0.5 0.0 0.5 
2
0.5 0.5 0.0
We note that 0 → 1, 0 → 2, 1 → 0, 1 → 2, 2 → 0, 2 → 1. Therefore 0 ↔ 1 and
1 ↔ 2 and all states belong to one class. Hence this is an irreducible chain.
Example 13. Consider another transition probability matrix


0.0 1.0 0.0 0.0
0

1 
 0.0 0.0 0.5 0.5  .
2  0.0 0.5 0.0 0.5 
3
0.0 0.5 0.5 0.0
We note that from States 1, 2 or 3, it is not possible to visit State 0, i.e
(n)
(n)
(n)
P10 = P20 = P30 = 0.
Therefore the chain is not irreducible (or it is reducible).
97
Definition 12. For any State i in a Markov chain, let fi be the prob-
ability that starting in State i, the process will ever re-enter State i.
State i is said to be recurrent if fi = 1 and transient if fi < 1.
Remark 19. If State i is recurrent then the process will return to State
i (with probability 1) again and again, infinitely many times.
If State i is transient then the probability that the process will be
in State i for n time periods will be fin−1(1 − fi), the Geometric
distribution and this probability goes to 0 as n → ∞.
Moreover the expected number of time periods that the process will
visit i is
∞
∑
1
n−1
nfi (1 − fi) =
< ∞.
1 − fi
n=1
98
Proposition 17. State i is recurrent if
∞
∑
(n)
Pii = ∞
n=1
and State i transient if
∞
∑
(n)
Pii < ∞.
n=1
Proof. Let X (n) be the state of the process after n transitions and define
{
1, if X (n) = i
In =
0, if X (n) ̸= i.
(
We note that
E
∞
∑
)
In|X (0) = i
n=0
is the expected number of times the process will visit State i given that initially
the process is in State i. It is infinite if State i is recurrent and finite if State i is
transient.
99
We have
E
=
=
=
=
(∞
∑
)
In|X (0) = i
n=0
∞
∑
E(In|X (0) = i)
n=0
∞ (
∑
n=0
∞
∑
n=0
∞
∑
1 · P (X (n) = i|X (0) = i) + 0 · (1 − P (X (n) = i|X (0)
)
= i))
P (X (n) = i|X (0) = i)
(n)
Pii .
n=0
Hence the result follows.
Remark 20. The proposition implies that a transient state will only
be visited finite number of times. Hence in a Markov chain of finite
states, we cannot have all states being transient.
100
Proposition 18. If State i is recurrent and State i communicates with State
j then State j is also recurrent.
Proof. Since i communicates with j, there exist integers k and m such that
(k)
(m)
Pij > 0 and Pji
> 0.
Now for any integer n we have
(m+n+k)
Pjj
Hence
∞
∑
n=1
(n)
Pjj
≥
∞
∑
n=1
(m+n+k)
Pjj
≥
(m)
(n)
(k)
≥ Pji · Pii · Pij .
∞
∑
(m) (n) (k)
Pji Pii Pij
n=1
=
(m) (k)
Pji Pij
∞
∑
(n)
Pii = ∞.
n=1
By Proposition 17 State j is recurrent.
Remark 21. If State i is transient and State i communicates with State j then
State j is also transient. Because if j is recurrent and i communicates with j then
by Proposition 18, State i must be recurrent too and this is a contradiction.
101
2.2.1
An Analysis of the Random Walk
• Recall that a person performs a random walk on the real line of
integers.
• Each time the person at state i can move one step forward (+1)
or one step backward (-1) with probabilities p (0 < p < 1) and (1−p)
respectively.
• Therefore we have the transition probabilities
Pi,i+1 = p,
Pi,i−1 = 1 − p
i = 0, ±1, ±2, . . . .
• Since all the states are communicated, by Proposition 18, all
states are either recurrent or they are all transient.
102
W.L.O.G. let us consider State 0. To classify this state we consider
∞
∑
(m)
P00 .
m=1
(2n+1)
We note that P00
= 0 and we have (Why?)
( )
2n
(2n)
P00 =
pn(1 − p)n.
n
Hence we have
)
∞
∞
∞ (
∞
∑
∑
∑
∑
(2n)! n
2n
(m)
(2n)
n
n
I=
P00 =
P00 =
p (1−p) =
p (1−p)n.
n
n!n!
m=1
n=1
n=1
n=1
Recall that if I is finite then 0 is transient (hence all the other
states) otherwise it is recurrence (hence all the other states).
103
• We have to apply the Stirling’s formula (we will give a proof later)
to get a conclusive result.
• The Stirling’s formula states that if n is large then
1 −n√
n+
n! ≈ n 2 e
2π.
• Hence we have
(4p(1 − p))n
(2n)
√
.
P00 ≈
πn
If p = 12 then we have
1
(2n)
P00 ≈ √ .
πn
If p ̸= 12 then we have
an
(2n)
P00 ≈ √
πn
where 0 < a = 4p(1 − p) < 1.
104
• Therefore when p = 12 , we have
∞
∞
∞
∑
∑
∑
1
1
(2n)
√
√
I=
>
=∞
P00 ≈
πn
πn
n=1
n=1
n=1
and State 0 is recurrent.
• When p ̸= 12 , we have
∞ n
∞
∞
n
∑
∑
∑
a
a
a
(2n)
√
√ =√
≤
I=
P00 ≈
πn
π
π(1 − a)
n=1
n=1
n=1
and State 0 is transient.
105
2.2.2
A Proof for the Stirling Formula
• Let y1, y2, . . . , yn be n independent Poisson random variables having same mean 1. Then
zn = y1 + y2 + . . . + yn,
the sum of the n Poisson random variables is also a Poisson random
variable with mean n and variance n.
• We have
P (zn = n) = P (n − 1 < zn ≤ n)
−1
zn − n
= P (√ < √
≤ 0)
n
∫ 0 n
1 −x2
√ e 2 dx
≈
−1
2π
√
n
1
≈ √
.
2πn
106
Because for large n we have
zn − n
√
∼ N (0, 1)
n
and
−x2
e 2
−1 , 0).
≈ 1 for x ∈ ( √
n
Now since zn is a Poisson random variable
e−nnn
P (zn = n) =
n!
therefore we have
1 −n√
n+
n! ≈ n 2 e
2π.
107
2.3
2.3.1
Simulation and Construction of Markov Chains
Simulation of Markov Chains with EXCEL
Consider a Markov chain process with THREE states {0, 1, 2} with the transition
probability matrix


0
0.2 0.3 0.5
P = 1  0.5 0.1 0.4  .
2
0.3 0.3 0.4
Given that X0 = 0, our objective here is to generate a sequence {X (n), n = 1, 2, . . .}
which follows a Markov chain process with the transition matrix P .
To generate {X (n)} there are three possible cases:
(i) Suppose X (n) = 0, then we have
P (X (n+1) = 0) = 0.2
P (X (n+1) = 1) = 0.3
(ii) Suppose X (n) = 1, then we have
P (X (n+1) = 0) = 0.5
P (X (n+1) = 1) = 0.1
(iii) Suppose X (n) = 2, then we have
P (X (n+1) = 0) = 0.3
P (X (n+1) = 1) = 0.3
108
P (X (n+1) = 2) = 0.5;
P (X (n+1) = 2) = 0.4;
P (X (n+1) = 2) = 0.4.
Suppose we can generate a random variable U uniformly distributed over [0, 1].
Then we generate the distribution in Case

 0 if
X (n+1) = 1 if

2 if
(i) when X (n) = 0 easily as follows:
U ∈ [0, 0.2),
U ∈ [0.2, 0.5),
U ∈ [0.5, 1].
The distribution in Case (ii) when X (n) = 1 can be generated as follows:

 0 if U ∈ [0, 0.5),
X (n+1) = 1 if U ∈ [0.5, 0.6),

2 if U ∈ [0.6, 1].
The distribution in Case (iii) when X (n) = 2 can be generated as follows:

 0 if U ∈ [0, 0.3),
X (n+1) = 1 if U ∈ [0.3, 0.6),

2 if U ∈ [0.6, 1].
109
• In EXCEL one can generate U , a random variable uniformly distributed over
[0, 1] by using “=rand()”.
• By using simple logic statement in EXCEL, one can simulate a Markov chain
easily. The followings are some useful logic statements in EXCEL.
(i) “B1” means column B and Row 1.
(ii) “=IF(B1=0,1,-1)” gives 1 if B1=0 otherwise it gives -1.
(iii) “=IF(A1 > B2,0,1)” gives 0 if A1 > B2 otherwise it gives 1.
(iv) “=IF(AND(A1=1,B2>2),1,0)” gives 1 if A1=1 and B2>2 otherwise it gives 0.
(v) “=max(1,2,-1) =2 ” gives the maximum of the numbers.
• A demonstration EXCEL file is available at
“http : //hkumath.hku.hk/ ∼ wkc/sim.xls′′
for your reference. The program generates a Markov chain process
X (1), X (2), . . . , X (30).
whose transition probability is P and X (0) = 0.
110
2.3.2
Construction a Markov Chain Model
• Given the observed data sequence {X (n)}, one can count the transition frequency
Fjk in the sequence from State j to State k in one step. Hence one can construct
the one-step transition matrix for the sequence {X (n)} as follows:


F11 · · · · · · F1m
 F21 · · · · · · F2m 
.
F =
(2.1)
.
.
.
.
 .
.
.
. 
Fm1 · · · · · · Fmm
• From F , one can get the estimates for Pkj as follows:


P11 · · · · · · P1m
 P21 · · · · · · P2m 
P =
..
..
.. 
 ..

Pm1 · · · · · · Pmm
where
Fkj
Pkj = m
∑
Fkj
if
m
∑
Fkj > 0 and Pkj = 0 if
j=1
(2.2)
m
∑
j=1
j=1
111
Fkj = 0.
• We consider a sequence {X (n)} of three states (m = 3) given by
{0, 0, 1, 1, 0, 2, 1, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 0, 1}.
We have the transition frequency matrices


1 6 1
F =  3 1 3 .
3 1 0
Therefore one-step transition matrices can be estimated as follows:


1/8 3/4 1/8
P =  3/7 1/7 3/7  .
3/4 1/4 0
A demonstration EXCEL file is available at
“http : //hkumath.hku.hk/ ∼ wkc/build.xls′′
for your reference.
112
(2.3)
(2.4)
(2.5)
2.4
Stationary Distribution of a Finite Markov Chain
(n)
Definition 13. A State i is said to have period d if Pii = 0 whenever n is not
divisible by d, and d is the largest integer with this property. A state with period
1 is said to be aperiodic.
Example 14. Consider the transition probability matrix
(
)
0 1
P =
.
1 0
We note that
(
P (n) =
(2n+1)
We note that P00
0 1
1 0
)n
(2n+1)
= P11
=
1
2
(
n
n+1
1 + (−1) 1 + (−1)
1 + (−1)n+1 1 + (−1)n
)
.
= 0, so both States 0 and 1 have a period of 2.
Definition 14. State i is said to be positive recurrent if it is recurrent and
starting in State i the expected time until the process returns to State i is finite.
Definition 15. A state is said to be egordic if it is positive recurrent and aperiodic.
113
Proposition 19. In a finite irreducible aperiodic Markov chain having
N states and transition probability matrix P , there exists M < ∞ such that
(m)
Pii > 0 for m ≥ M .
Proposition 20. In a finite irreducible aperiodic Markov chain having
N states and transition probability matrix P , there exists M < ∞ such that
(m)
Pij > 0 for any two States i, j and m ≥ M .
(k)
Proof. From Proposition 19 there is an K such that Pii > 0 for k ≥ K and all
states i.
For any two states i and j, as the chain is irreducible there exists kij such that
(k )
Pij ij > 0. Therefore
(m)
(m−kij )
Pij ≥ Pii
if m ≥ Mij = kij + K ( m − kij ≥ K ).
(k )
Pij ij > 0
The proof is complete if we let
M = max {M00, M01, . . . , M10, M11, . . .} .
We remark that here M is a finite set.
114
We recall the example of the marketing problem with X(0) = (1, 0). We observe
that
(
)
0.7 0.3
(1)
(0)
X = X P = (1, 0)
= (0.7, 0.3),
0.4 0.6
(
)
0.61 0.39
X(2) = X(0)P 2 = (1, 0)
= (0.61, 0.39),
0.52 0.48
(
)
0.5749 0.4251
X(4) = X(0)P 4 = (1, 0)
= (0.5749, 0.4251),
0.5668 0.4332
(
)
0.5715 0.4285
(8)
(0) 8
X = X P = (1, 0)
= (0.5715, 0.4285),
0.5714 0.4286
(
)
0.5714 0.4286
X(16) = X(0)P 16 = (1, 0)
= (0.5714, 0.4286).
0.5714 0.4286
It seems that
lim X(n) = (0.57, 0.42).
n→∞
In fact it exists and independent of X(0)! It means that in the long run, the
probability that a consumer belongs to Wellcome (Park’n) is given by 0.57 (0.42).
115
• We note that X(n) = X(n−1)P therefore if
lim X(n) = π
n→∞
then
π = lim X(n) = lim X(n−1)P = πP.
n→∞
n→∞
We have the following definition
Definition 16. A row vector
π = (π0, π1, . . . , πk−1)
is said to be a stationary distribution of a finite Markov chain if it satisfies:
(i)
k−1
∑
π i ≥ 0 and
π i = 1.
i=0
(ii)
πP = π,
i.e.
k−1
∑
i=0
116
π iPij = π j .
Proposition 21. For any irreducible and aperiodic Markov chain having k
states, there exists at least one stationary distribution.
Proposition 22. For any irreducible and aperiodic Markov chain having
k states, for any initial distribution X(0)
lim ||X(n) − π|| = lim ||X(0)P n − π|| = 0.
n→∞
n→∞
where π is a stationary distribution for the transition matrix P .
Proposition 23. The stationary distribution π in Proposition 22 is unique.
Proof. Suppose there are two stationary distributions π and π̂. Take X(0) = π
then X(n) = π for all n (Recall that π = πP ).
From Proposition 22 we have
0 = lim ||X(n) − π̂|| = lim ||π − π̂|| = ||π − π̂||
n→∞
n→∞
where
||V|| =
n
∑
i=1
Therefore we must have π = π̂.
117
|Vi|.
Remark 22. The requirement of aperiodic is important. The following
transition probability matrix is irreducible but NOT aperiodic:
(
)
0 1
P =
.
1 0
The stationary distribution can be shown to be
π = (0.5, 0.5).
One can check
(0.5, 0.5)P = (0.5, 0.05).
However, with x1 = (1, 0), the sequence
xn+1 = xnP
does NOT converges to π. This means a stationary distribution may
NOT be a steady-state distribution.
118
2.4.1
Applications of the Stationary Distribution
Example 15. Recall the marketing problem. The transition matrix is
(
)
1−α α
P =
.
β 1−β
To solve for the stationary distribution (π 0, π 1), we consider

= π0
 (1 − α)π 0 + βπ 1
απ 0
+ (1 − β)π 1 = π 1

π0
+ π1
= 1.
Solving the linear system of equations we have
{
π 0 = β(α + β)−1
π 1 = α(α + β)−1.
Therefore in the long run, the market shares of Wellcome and Park’n
are respectively
β
α
and
.
(α + β)
(α + β)
119
Example 16. Recall the genetic problem. If a parent is randomly cho-
sen from the population and then randomly choose one of its genes.
This is equivalent to choose a gene from the population randomly (At
t = 0, P (AA) = p0, P (aa) = q0, P (aA) = r0).
By conditioning on the gene pair of the parent the probability that a
randomly chosen will be of type A and a are given by
P (A) =
=
=
and
P (a) =
=
=
P (A|AA)p0 + P (A|aa)q0 + P (A|Aa)r0
1 · p0 + 0 · q0 + 21 r0
p0 + r20
P (a|AA)p0 + P (a|aa)q0 + P (a|Aa)r0
0 · p0 + 1 · q0 + 12 r0
q0 + r20 .
120
Therefore under random mating a randomly chosen member of
the next generation (t = 1) will be type AA, the probability is
r0 2
p1 = P (A)P (A) = (p0 + ) ;
2
type aa, the probability is
r0 2
q1 = P (a)P (a) = (q0 + ) ;
2
type Aa, the probability is
r0
r0
r1 = P (A)P (a) + P (a)P (A) = 2P (a)P (A) = 2(p0 + )(q0 + ).
2
2
121
We repeat the argument above, then in the second generation we have
P (A) = P (A|AA)p1 + P (A|aa)q1 + P (A|Aa)r1
= 1 · p1 + 0 · q1 + 12 r1
r1
= p1 +
2r
r0
r0
0 2
= (p0 + ) + (p0 + )(q0 + )
r20
r0 2
r0 2
r0
= (p0 + )(p0 + + q0 + ) = p0 +
2
2
2
2
and
P (a) = P (a|AA)p1 + P (a|aa)q1 + P (a|Aa)r1
= 0 · p1 + 1 · q1 + 12 r1
r1
= q1 +
2r
r0
r0
0 2
= (q0 + ) + (p0 + )(q0 + )
r0 2
r0 2
r0
r20
= (q0 + )(q0 + + p0 + ) = q0 + .
2
2
2
2
This is called the Hardy-Weinberg’s law.
122
• Now we are interested in the long run distribution of AA, aa and Aa in the population. This can be analyzed by using Markov chain process.
Consider a single individual and his/her descendants. We assume that each individual has exactly one offspring and let X (n) be the state of the gene pairs in his/her
nth generation. We assume that the population is in steady state, i.e.
pn = p,
qn = q and rn = r.
The probability transition matrix of the process is given by


r
r
AA
p+2 0 q+2
P = aa  0 q + 2r p + 2r  .
p
r q
r
1
+
+
Aa
2
4 2
4
2
Take for example, we must have Paa,AA = PAA,aa = 0 and
1
r
PAA,AA = P (A|AA)p + P (A|aa)q + P (A|Aa)r = 1 · p + 0 · q + r = p + .
2
2
PaA,aA = 12 (P (A|AA)p + P (A|aa)q + P (A|Aa)r)
+ 12 (P (a|AA)p + P (a|aa)q + P (a|Aa)r)
1
= 21 (1 · p + 0 · q + 12 r) + 21 (0 · p + 1 · q + 12 r) = p+q+r
=
2
2.
123
By direction verification, it can be shown below that
(p, q, r)P = (p, q, r) and p + q + r = 1
The first equation:
r
p r
r
p(p + ) + q · 0 + r( + ) = (p + )2 = p.
2
2 4
2
The second equation:
r
q r
r
p · 0 + q(q + ) + r( + ) = (q + )2 = q.
2
2 4
2
The third equation:
r
r
r
pr qr r(r + p + q)
p(q + ) + q(p + ) + = 2pq +
+ +
2
2
2
2r
2
r 2
= 2p(q + ) + r(q + )
2
r2
r
= 2(p + )(q + )
2
2
= r.
Hence the stationary distribution of the process is (p, q, r).
124
Example 17. We recall the problem of obtaining a list of webpages
with ranking. Google developed an algorithm for ranking the webpages. The PageRank of a webpage is defined as follows:
Let N be the total number of webpages in the web and we define a
matrix Q called the hyperlink matrix.
Here
Qij =
{
1/k if webpage j is an outgoing link of webpage i;
0 otherwise;
and k is the total number of outgoing links of webpage i.
For simplicity of discussion, here we assume that Qii > 0 for all
i. This means for each webpage, there is a link pointing to itself.
Hence Q can be regarded as a transition probability matrix of
a Markov chain of a random walk.
125
• One may regard a surfer as a random walker and the webpages as
the states of the Markov chain.
• Assuming that this underlying Markov chain is irreducible and aperiodic, then the steady-state probability distribution
(p1, p2, . . . , pN )T
of the states (webpages) exists.
• Here pi is the proportion of time that the random walker (surfer)
visiting state (webpage) i.
• The higher the value of pi is, the more important webpage i will
be. Thus the PageRank of webpage i is then defined as pi.
126
• Here we give a numerical demonstration. Let us consider a web of
three webpages: 0, 1, 2.
Suppose that the links are given as follow: 0 → 1, 0 → 2, 1 →
0 and 2 → 1. The outdegrees of States 0, 1, 2 are 3, 2, 2 respectively.
• The transition probability of this

0 1/3
P = 1  1/2
2
0
Markov chain is given by

1/3 1/3
1/2 0  .
1/2 1/2
The steady-state probability distribution p = (p0, p1, p2) satisfies
p = pP
and
p0 + p1 + p2 = 1.
127
• Solving the linear system of equations, we get
3 4 2
(p0, p1, p2) = ( , , ).
9 9 9
The ranking of the webpages: Webpage 1 > Wepbage 0 > Webpage 2.
• It is clear that both Webpages 0 and 2 point to Webpage 1 and
therefore it must be the most important. Since the most important
Webpage 1 points to Webpage 0 only, Webpage 0 is more important
than Webpage 2.
• We remark that the steady state probability distribution may not
exist as the Markov chain may not be irreducible. But one can always
consider the following transition probability matrix:
α
P̃ = (1 − α)P + (1, 1, . . . , 1)T (1, 1, . . . , 1)
N
for very small positive α. Then P̃ is irreducible (Exercise).
128
2.5
More Markov Chain Models
In this section, we introduce the concepts of high-order Markov
models and hidden Markov models by some examples.
2.5.1
High-order Markov Models
Definition 17. Suppose that
P (X (n+1) = j|X (n) = in, X (n−1) = in−1, . . . , X (0) = i0)
= P (X (n+1) = j|X (n) = in, X (n−1) = in−1, . . . , X (n−k) = in−k ),
i, j, i0, i1, . . . , in−1 ∈ M . Then this is called a (k+1)th order Markov
chain process.
• This means that the distribution of X (n+1) depends on the states
of X (n), . . . X (n−k).
129
n≥1
Example 18. Consider a 2nd-order Markov chain in the state-space M = {0, 1}.
The distribution of X (n+1) depends on the states of X (n) and X (n−1).
To define this Markov chain process, one has to define four distributions.
P (X (n+1) = i|X (n) = 0, X (n−1) = 0),
P (X (n+1) = i|X (n) = 0, X (n−1) = 1),
P (X (n+1) = i|X (n) = 1, X (n−1) = 0),
P (X (n+1) = i|X (n) = 1, X (n−1) = 1),
i = 0, 1 and n = 1, 2, 3, . . ..
Example 19. In the previous example, the four distributions can be:
P (0|(0, 0)) = 0.4;
P (0|(0, 1)) = 0.2;
P (1|(0, 0)) = 0.6;
P (1|(0, 1)) = 0.8;
P (0|(1, 0)) = 0.3;
P (0|(1, 1)) = 0.1;
P (1|(1, 0)) = 0.7;
P (1|(1, 1)) = 0.9.
If X (3) = 0 and X (4) = 1 then
P (X (5) = 0) = 0.3 and P (X (5) = 1) = 0.7.
130
Example 20. Given the observed sequence:
0, 1, 1, 0, 0 , 1, 1, 0, 0 , 1, 1, 0, 1, 1, 0, 1, 0, 0 , 1, 1.
How to estimate the four distributions for a 2nd-order Markov model?
By counting the “transition frequency” one can obtain the following table.
(X (n−1), X (n)) X (n+1) = 0 X (n+1) = 1 P (X (n+1) = 0) P (X (n+1) = 1)
(0, 0)
0
3
0/3
3/3
(0, 1)
1
5
1/6
5/6
(1, 0)
2
2
2/4
2/4
(1, 1)
4
0
4/4
0/4
Remark 23. The transition matrix of the Markov chain model for the sequence
is
(
)
1/3 2/3
.
1/2 1/2
Which model do you think is better for modeling the sequence?
131
2.5.2
Hidden Markov Model
Sometimes some states are not observable and they are hidden.
Example 21. Consider a process of choosing a die and obtaining number of dots
by throwing the die.
Suppose we have two dice A and B, each has four faces (1,2,3 and 4). Die A is fair
and Die B is bias.
The probability distributions of dots obtained by throwing the dice A and B are
given below:
Die 1 2 3 4
A 1/4 1/4 1/4 1/4
B 1/6 1/6 1/3 1/3
Each time a die is chosen, with probability α, Die A is chosen and with probability
(1 − α), Die B is chosen. This is hidden (no one knows the chosen die is A or B).
The value of α is to be determined. The die is thrown and the number of dots (this
is observable) obtained is recorded. The following is a possible realization of the
process:
A → 1 → A → 3 → B → 2 → A → 1→ B → 4 → B → 1.
132
• The process can be modeled by a Markov chain with the transition probability
matrix being given by (embedding techniques)


0 0 1/4 1/4 1/4 1/4
A

B 
 0 0 1/6 1/6 1/3 1/3 
1 
α 1−α 0 0 0 0 

.
P =

2 α 1−α 0 0 0 0 

3 α 1−α 0 0 0 0 
4
α 1−α 0 0 0 0
• How to estimated α, if the following sequence of dots (in steady-state) is observed?
1, 3, 3, 2, 3, 2, 4, 3, 1, 3, 3, 4.
We note that
A
B
1
P2 =
2
3
4

α 1−α
α 1−α

0 0

0 0

0 0
0 0
0
0
1
α
+
6
12
1
α
+
6
12
1
α
6 + 12
1
α
+
6
12
133
0
0
1
α
+
6
12
1
α
+
6
12
1
α
6 + 12
1
α
+
6
12
0
0
1
α
−
3
12
1
α
−
3
12
1
α
3 − 12
1
α
−
3
12

0
0 

1
α 
3 − 12  .
1
α 
−
3
12 
1
α 
3 − 12
1
α
−
3
12
If we ignore the hidden states, the observable states have the transition probability
matrix given by
1 α 1 α 1 α 1 α 
1
6 + 12 6 + 12 3 − 12 3 − 12
1
α 1
α 1
α 1
α 
+
+
−
−
2 
′
6
12 6
12 3
12 3
12  .
P = 
1
α 1
α 1
α 1
α 

3
+
+
−
−
6
12 6
12 3
12 3
12
1
α 1
α 1
α 1
α
4
+
+
−
−
6
12 6
12 3
12 3
12
The stationary distribution of P ′ is given by (Why?)
1 α 1 α 1 α 1 α
p = ( + , + , − , − ).
6 12 6 12 3 12 3 12
This should be consistent with the observed occurrence of each state in the sequence:
1 α 1 α 1 α 1 α
2 2 6 2
p = ( + , + , − , − ) ≈ q = ( , , , ).
6 12 6 12 3 12 3 12
12 12 12 12
The unknown parameter α can be obtained by solving
{ 2
}
4
∑
1
α
2
2
(pi − qi) = min
+
.
min ||p − q||2 = min
0≤α≤1
0≤α≤1 36
0≤α≤1
18
i=1
In this case, the best α is 0.
134
2.6
A Summary of Learning Outcomes
• Able to give the definition of a Markov chain, transition probability,
transition matrix, irreducibility.
• Able to classify the states of a Markov chain.
• Able to compute and interpret the steady-state probability distribution of a Markov chain.
• Able to compute and interpret the mean time in transient states of
a Markov chain.
• Able to construct and apply a simple HMM and compute the model
parameters.
135
2.7
Exercises
1. Let P be the transition probability matrix of a Markov chain with m states.
Prove or disprove the followings.
(a) Suppose that P n is a positive matrix (i.e. all the entries are positive) then
P k is also positive for all k ≥ n.
(b) If State j can be reach from State i then it can be reach in m steps or less.
2. Let P be the transition probability matrix of a Markov chain with m states.
Suppose that
m
∑
k=1
Pkj =
m
∑
Pjk = 1 for j = 1, 2, . . . , m.
k=1
Find the stationary distribution of the Markov chain.
3. Construct an m × m transition probability P such that
π = (π 1, . . . , π m),
π i ≥ 0,
m
∑
i=1
is the stationary distribution of P .
136
πi = 1
4. Consider a random walk on a polygon of n vertices 1, 2, . . . , n. Each time the
random walker may move one step clockwisely with probability p or one step
anti-clockwisely with probability 1 − p where 0 ≤ p ≤ 1. By regarding each
vertex as a state, this random walk process is a Markov chain with the following
transition probabilities
P1,2 = P2,3 = . . . = Pn−1,n = Pn,1 = p
and
P2,1 = P3,2 = . . . = Pn,n−1 = P1,n = 1 − p.
(i) Write down the transition probability matrix of the Markov chain.
(ii) Find the steady-state probability distribution π of the Markov chain.
(iii) Let π k be the state probability distribution of the Markov chain at time k.
Prove or disprove the following statement, for 0 ≤ p ≤ 1, we have
lim ||π k − π||2 = 0.
k→∞
137
5. Consider a random walk on a hexagon with its centroid as shown in Figure 1.
Suppose that at each State i(i = 0, 1, 2, 3, 4, 5, 6), the transition probabilities to
other adjacent states are equal and the probability of staying at the same state
in the next transition is zero.
5•
Q
Q
Q
•4
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
6•
•0
•3
•2
1•
Figure 2.5. The Random Walk.
(i) Show that the Markov chain of the random walk is irreducible and all the
states are recurrent.
(ii) Find the state-state probability distribution
π = (π 0, π 1, π 2, π 3, π 4, π 5, π 6)
of the Markov chain where
6
∑
πP = π and
π i = 1 and π i ≥ 0 for i = 0, 1, . . . , 6.
i=0
138
2.8
Suggested Solutions
1. (a) Yes. One may prove it by induction on n. The statement is true for i = n.
Assume that the statement is true for i = n + k then
m
∑
(n+k)
[P n+k+1]ij =
Pil
Plj
l=1
must be positive. Otherwise Plj = 0 for l = 1, 2, . . . , m. This implies that
Pijn = 0 for all n = 1, 2, . . . and i = 1, 2, . . . , m. This is a contradiction.
(b) Yes. One can regard this as a graph problem by letting each state as a
vertex and there is directed path from State a to State b if Pab > 0. Suppose
the statement is false then the length of shortest path from i to j is strictly
greater than m. Let us say the path is
i → a1 → a2 → . . . → aN → j
where N ≥ m. We note that the states ai are distinct and cannot be j or
otherwise we can get a shorter path. But this means that we have at least
m + 1 distinct states in the system. Hence we have a contradiction and the
statement must be true.
139
2. Let
1
π = (1, 1, . . . , 1),
m
we note that
πP = π
and
m
∑
π i = 1.
i=1
Therefore π is the stationary distribution.
3. Define P = (1, 1, . . . , 1)T π then πP = π.
140
4. (i) If we order the states from 1 to n then the transition probability matrix is
given by


0
p
0 ···
0 1−p
1−p 0
p
0
···
0 


 0 1−p 0
p
···
0 


P = .
.
.
.
.
.
.
.
.
.
.
. 
 .

 0
···
0 1−p 0
p 
p
0 ···
0 1−p 0
(ii) We note that (1, 1, . . . , 1)P = (1, 1, . . . , 1), therefore
1
π = (1, 1, . . . , 1).
n
(iii) For 0 < p < 1, the Markov chain is irreducible and aperiodic, therefore we
have
lim ||π k − π||2 = 0.
k→∞
But for p = 1 or p = 0, beginning with π 0 = (1, 0, . . . , 0), we have
√
√
2
n(n − 1)
n − 1 (n − 1)
||π k − π||2 =
+
=
.
n2
n2
n
independent of k. Thus the statement is false.
141
5. (i) We note that all the States 1, 2, 3, 4, 5, 6 communicate with State 0 and
therefore all the states communicate with each other. Hence the Markov chain
is irreducible. Now at least one of the states is recurrent, otherwise all the states
are transient and will be visited finite many times. This is impossible. If there
is a recurrent state, let say i, as all the other states communicate with i, all the
other states are also recurrent.
(b)(ii) Since the Markov chain is finite and irreducible, there exists an unique
stationary distribution π. By symmetry we have
π 1 = π 2 = π 3 = π 4 = π 5 = π 6 = p.
Hence we have
π 0 + 6p = 1.
Moreover from States 1, 2, 3, 4, 5, 6 the process can go to State 0 with probability
1/3.
Therefore
1
6p
π 0 = (π 1 + π 2 + π 3 + π 4 + π 5 + π 6) =
= 2p.
3
3
Solving the two equations we have
1 1 1 1 1 1 1
π = ( , , , , , , ).
4 8 8 8 8 8 8
142