Download Lecture Notes - Kerala School of Mathematics

Document related concepts

Randomness wikipedia , lookup

Birthday problem wikipedia , lookup

Infinite monkey theorem wikipedia , lookup

Inductive probability wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability box wikipedia , lookup

Probability interpretations wikipedia , lookup

Law of large numbers wikipedia , lookup

Conditioning (probability) wikipedia , lookup

Transcript
Markov Chains
INDER K. RANA
Department of Mathematics
Indian Institute of Technology Bombay
Powai, Mumbai 400076, India
email: [email protected]
Abstract. These notes were originally prepared for a College Teacher’s
Refresher course at University of Mumbai. The current revised version
is for the participants of the Summer school on ”Probability Theory” at
Kerala School of Mathematics,2010.
Contents
Prologue.
§0.1.
§0.2.
Basic Probability Theory
1
Probability space
1
Conditional probability
1
Chapter 1.
Basics
3
§1.1.
Introduction
3
§1.3.
§1.5.
§1.2.
Random walks
7
Queuing chains
9
§1.4.
Ehrenfest chain
10
Some consequences of the markov property
11
Review Exercises
Chapter 2.
§2.1.
§2.2.
Calculation of higher order probabilities
12
15
Distribution of Xn and other joint distributions
15
Kolmogorov-Chapman equation
20
Exercises
Chapter 3.
21
Classification of states
23
§3.1.
Closed subsets and irreducible subsets
Exercises
26
§3.2.
Periodic and aperiodic chains
27
Exercises
28
§3.3.
Visiting a state: transient and recurrent states
29
Absorption probability
36
§3.5.
More on recurrence/transience
41
§3.4.
Chapter 4.
Stationary distribution for a markov chain
23
45
§4.1.
Introduction
45
§4.2.
Stopping times and strong markov property
46
§4.3.
Existence and uniqueness:
48
§4.4.
Asymptotic behavior
53
vii
viii
Contents
Diagonalization of matrices
55
References
57
Index
59
Prologue
Basic Probability
Theory
0.1. Probability space
A mathematical model for analyzing statistical experiments is given by a probability
space. A probability space is a triple (Ω, S, P ) where:
• Ω is a set representing the set of all possible outcomes of the experiment.
• S is a σ-algebra of subsets of Ω. Subsets of Ω are called events of the
experiment. Elements of S represents the collection of events of interest in
that experiment.
• For every E ∈ S, the nonnegative number P (E) is the probability that the
event E occurs. The map E 7→ P (E), called a probability, is P : S → [0, 1],
with the following properties:
(i) P (∅) = 0, and P (Ω) = 1.
(ii) P is countably additive,i.e., for countable sequence A1 , A2 , . . . , An , . . .
in S, which is pairwise disjoint: Ai ∩ Aj = ∅,
P (∪∞
n=1 (Ai )) =
∞
X
P (Ai ).
i=1
0.2. Conditional probability
Let (Ω, S, P ) be a probability space. If B is an event with P (B) > 0, then for every
A ∈ S, the conditional probability of A given B, denoted by P (A|B), is defined by
P (A|B) =
P (A ∩ B)
.
P (B)
Intuitively,PB (A) := P (A|B) is as how likely is the event A to occur, given the knowledge that B has occurred.
Some properties of conditional probability are:
(i) For countable sequence A1 , A2 , . . . , An , . . . in S, which is pairwise disjoint
PB (∪∞
n=1 (Ai )) =
∞
X
PB (Ai ).
i=1
1
Prologue
2
(ii) Chain rule
P (A ∩ B) = P (A|B) P (B).
In general, for A1 , A2 , . . . , An ∈ S,
P (A1 ∩ A2 ∩ . . . ∩ An )
= P (A1 |A2 ∩ A2 ∩ . . . ∩ An ) P (A2 |A3 ∩ A2 ∩ . . . ∩ An ) . . . P (An−1 |An ),
and for B ∈ S,
P (A1 ∩ A2 ∩ . . . ∩ An |B)
= P (A1 |A2 . . . An ∩ B) P (A2 |A3 ∩ A2 . . . An ∩ B) . . . P (An |B).
(iii) Bay’s formula If A1 , A2 , . . . , An , . . . in S, are pairwise disjoint and Ω =
∪∞
n=1 Ai , then for B ∈ S,
P (B|Ai )
P (Ai |B) = P∞
P
(B|Aj )P (Aj )
j=1
(iv) Conditional impendence Let A1 , A2 , . . . , An , . . . in S, be pairwise disjoint
such that
P (A|Ai ) = P (A|Aj ) := p for every 1, j
then P (A| ∪∞
i=1 Ai ) = p.
(v) If A1 , A2 , . . . , An , . . . in S, are pairwise disjoint and Ω = ∪∞
n=1 Ai , then for
B, C ∈ S,
∞
X
P (C|B) =
P (Ai |B) P (C|Ai ∩ D)
i=1
Chapter 1
Basics
1.1. Introduction
The aim of our lectures is to analyze the following situation: Consider an experiment/system under observation and let s1 , s2 , ..., sn , ... be the possible states in which
the system can be. Let us suppose that the system is being observed at every unit
of time: n = 0, 1, 2, . . . . Let Xn denote the observation at time n ≥ 0. Thus each Xn
can take either of the values s1 , s2 , ..., sn , .... We further assume that the observations
Xn′ s are not ’deterministic’, i.e., Xn can take value si with some probability. In other
words, each Xn is a random variables on some probability space (Ω, A, P ). In case,the
observations X0 , X1 , ... are independent, we know how to compute the probabilities of
various events. The situation we are going to look at is slightly more general. Let us
look at some examples.
1.1.1 Example:
Consider observing the working of a particular machine in a factory. On any day, either
the machine will be broken or it will be working. So our system can be in any one of
the two states: ’broken’ - represented by 0 , or ’working’- represented by 1. Let Xn be
the observation about the machine on nth day. Clearly, there is no reason to assume
that Xn will be independent of Xn−1 , ..., X0 .
1.1.2 Example:
Consider a gambler making bets in a gambling house. He starts with some amount say
A rupees and makes a series of one rupee bets against the house. Let Xn , n ≥ 0 denote
the gambler’s capital at time n, say after n bets. Then, the states of the system, the
possible values each Xn can take, are 0, 1, 2, ... Clearly, the values of Xn depends upon
the values of Xn−1 .
1.1.3 Example:
Consider a bill collection office where people come to pay their bills. People arrive
at the paying counter at various time points and are being served eventually. Let us
suppose that we measure time in minutes. Then the number of persons that arrive
during one minute are taken as the ones which arrive at that minute and let us say
at most one person can be/will be served in a minute. Let ξn denote the number of
persons that arrive at the nth minute. Let X0 denote the number of persons that were
waiting initially, (i.e., when the office opened) and for n ≥ 1, let Xn denote the number
3
1. Basics
4
of customers at the nth minute. Thus, for all n ≥ 0,
Xn+1
=
ξn+1 , if Xn = 0,
Xn+1
=
Xn + ξn+1 − 1, if Xn ≥ 0,
because one person will be served in that minute. The states of the system are 0, 1, 2, ...,
and clearly Xn+1 depends upon Xn .
Thus, we are going to look at a sequence of random variables {Xn }n≥0 defined on
a probability space (Ω, A, P ), such that each Xn can take at most countable number
of values. As mentioned in the beginning, if Xn′ s are independent, then one knows
how to analyze the system. If Xn′ s are not independent, what kind of relation Xn′ s
can have? For example, let us consider the system of example 1.1.1: observing the
working of a machine on each day. Clearly, the observation that the machine will be
”in order” or ”not in order” on a particular day depends only upon the fact that it
was working or was out of order on previous day. Or in example 1.1.2, the example of
gambler, his capital on nth day will be depend only upon his capital on the (n−1)th day.
This motivates the following assumption about our system.
1.1.4 Definition:
Let {Xn }n≥0 be a sequence of random variables taking values in a set S, called state
space, which is at most a countable set. We say that has {Xn }n≥0 has the markov
property if for every n ≥ 0 and i0 , i1 , . . . in ∈ S,
P {Xn+1 = in+1 |X0 = i0 , X1 = i1 , ...Xn = in }
= P {Xn+1 = in+1 |Xn = in } for all n ≥ 0.
That is, the observation/outcome at the (n + 1)th stage of the experiment depends
only on the outcome immediate past. Thus, if n ≥ 0, and i, j ∈ S, then the numbers
P (i, j, n) := P {Xn+1 = j|Xn = i}
are going to be important for the system. This is the probability that the system will
be in state j at stage n + 1 given that it was in state i at stage n.
Note that saying that a sequence {Xn }n≥1, has markov property means that given
Xn−1 , the random variable Xn is conditionally independent of Xn−2 , . . . , X1 , X0 . It
means that the distribution of the sequence to go to next step depends only upon
where the system is now and not where it has been in the past.
1.1.5 Definition:
Let {Xn }n≥1, be a markov system with state space S.
(i) For n ≥ 0, and i, j ∈ S, the number P (i, j, n) is called the one step transition
probability for the system at stage n to go from state i to the state j at the next
stage.
(ii) The system is said to have the stationary property or the homogeneous property if P (i, j, n) is independent of n, i.e.,
P (i, j, n + 1) = P (i, j, n) for every i, j ∈ S, n ≥ 1.
That is the probability that the system will be in state j at stage n + 1 given that
it is in state i at stage n is independent of n. Thus, the probability of the system in
1.1. Introduction
5
going from state i to j does not depend upon the time at which this happens.
(iii) A markov system {Xn }n≥1 is called a markov chain if it is stationary.
1.1.6 Definition:
Given a markov chain {Xn }n≥1 ,
Π0 (i) := P {X0 = i}, i ∈ S
is called the initial distribution vector or the distribution of X0 .
1.1.7 Graphical representation:
A pictorial way to represent a markov chain is by its transition graph. It consists of
nodes representing the states of the chain and arrows between the nodes representing
the transition probabilities. The transition graphs of examples markov chain in example 1.1.1 is as follows:
p(0, 0)
=
p,
p(0, 1)
=
p(1, 0)
=
q,
p(1, 1)
=
1 − q.
1 − p,
1.1.8 Theorem:
Let {Xn }n≥1 , be the markov chain with state space S, transition probabilities p(i, j),
and initial distribution vector Π0 (i). Let P be the matrix
P = [pij ]i×j .
Then the following hold:
(i) 0 ≤ p(i, j), Π0 (i) ≤ 1.
X
p(i, j) = 1.
(ii) For every i,
j∈S
(iii) For every j,
X
Π0 (i) = 1.
i∈S
1.1.9 Definition:
The matrix P = [p(i, j)]i×j is called the transition matrix of the markov chain. It
has the property that each entry is a non negative number between 0 and 1, sum of
each row and each column is 1.
Let us look at some examples.
1.1.10 Example:
Consider the example 1.1.1, observing the working of a machine. Here S = {0, 1}. Let
P {Xn+1 = 1|Xn = 0} := p(0, 1) = p,
Then,
P {Xn+1 = 0|Xn = 1} := p(1, 0) = q.
P {Xn+1 = 0|Xn = 0} = 1 − p and {Xn+1 = 1|Xn = 1} = 1 − q.
Thus, the transition matrix is
P =
1−p
q
p
1−q
.
Another way of describing a markov chain is given by
1. Basics
6
1.1.11 Theorem:
A sequence of random variables {Xn }n≥0 is a markov chain with initial vector Π0 and
transition matrix P, if and only if for every n ≥ 1, and i0 , i1 , . . . , in ∈ S,
P {X0 = i0 , X1 = i1 , . . . , Xn = in }
= Π0 (i)p(i0 , i1 )p(i1 , i2 ) · · · p(in−1 , in ).
(1.1)
Proof:
First suppose that {Xn }n≥0 is a markov chain with initial vector Π0 and transition
matrix P. Then using the chain rule for conditional probability,
P {X0 = i0 , X1 = i1 , . . . , X = in }
= P {X0 = i0 }P {X1 = i1 |X0 = i0 } · · · P {Xn = in |X0 = i0 , · · · , Xn−1 = in−1 }
= Π0 (i) p(i0 , i1 ) p(i1 , i2 ) · · · p(in−1 , in ),
Conversely, if equation (1.1) holds, then summing both sides over in ∈ S,
X
P {X0 = i0 , X1 = i1 , . . . , X = in }
in ∈S
=
X
in ∈S
Π0 (i)p(i0 , i1 )p(i1 , i2 ) · · · p(in−1 , in ).
Thus,
P {X0 = i0 , X1 = i1 , . . . , Xn−1 = in−1 }
X
=
P {X0 = i0 , X1 = i1 , . . . , X = in }
in ∈S
=
X
in ∈S
Π0 (i)p(i0 , i1 )p(i1 , i2 ) · · · p(in−1 , in )
= Π0 (i)p(i0 , i1 )p(i1 , i2 ) · · · p(in−2 , in−1 ).
Proceeding similarly, we have for every n = 0, 1, . . . , ik ∈ S,
P {X0 = i0 , X1 = i1 , . . . , Xk = ik }
= Π0 (i)p(i0 , i1 )p(i1 , i2 ) · · · p(ik−1 , ik ).
Thus, for k = 0, we have
P {X0 = i0 } = Π0 (i)
and
P {Xn+1 = in+1 | X0 = i0 , . . . , Xn = in }
P {X0 = i0 , . . . Xn = in , Xn+1 = in+1 }
=
P {X0 = i0 , . . . Xn = in , Xn = in }
Π0 (i)p(i0 , i1 )p(i1 , i2 ) · · · p(in , in+1 )
=
Π0 (i)p(i0 , i1 )p(i1 , i2 ) · · · p(in−1 , in )
= p(in , in+1 ).
Hence, {Xn }n≥0 is a markov chain with initial vector Π0 , and transition probabilities
p(i, j), i, j ∈ S.
1.2. Random walks
7
1.2. Random walks
1.2.1 Example(Unrestricted random walk on the line):
Consider a particle which moves one unit to the left with probability 1 − p or to the
right on the line with probability p. This is called unrestricted random walk on the
line. Let Xn denote the position of the particle at time n. Then S = {0, ±1, ±2, ...}
and the markov chain has the transition graph and the transition matrix:
−3
−1
0
...
...
..
.
...
...
0
...
...
P =
..
. 
−3 


−2 



−1 




0 



1 



2 



3
..
.
−2
1
2
3
...
...
...
(1 − p)
0
p
...
...
0
(1 − p)
0
p
0
.
...
...
0
(1 − p)
0
p
0
0
(1 − p)
p
0
..
.
..
.
...
..
.
...
...

... 


... 



... 




... .



... 



... 



...
1.2.2 Random walk on the line with absorbing barriers:
We can also consider the random walk on the line in with state space S = {0, 1, 2, 3, ..., r}
and the condition that the walk ends if the particle reaches 0 or r. The states 0 and
r are called absorbing states for the particle that reaches this state and is absorbed
in it. It cannot leave the state. The transition graph and the transition probability
matrix for this walk is given by
1. Basics
8
0
P =

1
0 

 (1 − p)
1 



0
2 


 ...




0
.. 

. 

0
r
1
2
3
...
...
0
0
...
...
...
0
p
0
...
...
(1 − p)
0
p
0
...
...
...
...
...
(1 − p)
0
...
...
0
...
0
r
0



0 



0 



... 



p 



1 
A typical illustration of this situation is when two players are gambling with total
capital r rupees. The game ends when A looses all the money, i.e., 0 stage or B looses
all the money, i.e., stage r for A, and Xn is the capital of A at nth stage.
1.2.3 Random walk on the line with reflecting barriers:
Another variation of the previous example is the situation where two friends are
gambling with a view to play longer. So they put the condition that every time a
player loses his last rupee, the opponent returns it to him. Let Xn denote the capital
of a player A at nth stage. If total money both the players have is r + 1 rupees, then
the state space for the system is S = {1, 2, 3, . . . , r}. To find the transition matrix, note
that in the first row,
P (1, 1)
=
=
=
p(1, 2)
P {A’s capital remains Rs.1 at next stage given that it was 1 at this stage.}
P {A has last rupee and loses. It will be returned}
=
(1 − p).
=
P {Capital of A becomes 2| it is 1 now}
=
p(1, j)
P {Xn+1 = 1|Xn = 1}
=
P {A wins} = p.
0 for j ≥ 3.
For the ith row, 1 < i < r, and 1 ≤ j ≤ r,
p(i, j) = P {Xn+1 = j|Xn = i} =
Thus, the transition matrix is given by:


p
0

(1 − p)
if
if
if
j = i + 1,
j = i 1 < i < r,
j = i − 1.
1.3. Queuing chains
9
1

1 
 (1 − p)




2 
 (1 − p)


0
3 



0
i 



.. 
 ···
. 

r 
0
2
3
...
...
...
p
0
···
···
···
0
p
0
···
···
(1 − p)
0
p
0
···
···
···
(1 − p)
0
p
···
···
···
···
···
···
i
...

···
···
0
r
(1 − p)
0 





0 


 .
0 
.


0 



··· 



p 
1.2.4. Birth and death chain
Let Xn denote the population of a living system at time n, n ≥ 1. The state space for
the system {Xn }n≥1 is {0,1,2,...}. We assume that at any given stage n, if Xn = x,
then the population increases to x + 1, by a unit with probability px or decreases to
x − 1 with probability qx , or can remain the same with probability rx . Then,
p(x, y) =

px if








qx if



rx if









0


y = x + 1,
y = x − 1,
y = x,
otherwise.
Clearly, this is a markov chain, called the birth and death chain and is a special
case of random walks.
1.3. Queuing chains
Consider a counter where customers are being served at every unit of time. Let X0
be the number of customers in the queue to be served when the counter opens and let
ξn be the number of customers who arrive at the nth unit of time. Then, Xn+1 the
number of customers waiting to be served at the beginning of n + 1th time unit is
Xn+1 =



ξn
if
Xn = 0,
Xn + ξn−1
if
Xn ≥ 1.
The state space for the system {Xn }n≥1 is S = {0, 1, 2, ...}. If {ξn }n≥1 are independent
random variables taking only nonnegative integral values, then {Xn }n≥1 is a markov
chain. In case {ξn }n≥1 is also identically distributed with distribution function f, we
1. Basics
10
can calculate the transition probabilities: for x, y ∈ S,
p(x, y)
=
P {Xn+1 = y|Xn = x}
=
P {Xn+1 = y = ξn }
P {Xn+1 = y = ξn−1 + Xn }
=
P {ξn = y}
P {ξn = y − x + 1}
=
f (y)
f (y − x + 1)
if x = 0,
if x ≥ 1.
if x = 0,
if x > 1.
if x = 0
if x > 1.
1.4. Ehrenfest chain
Consider two isolated containers labeled as body A and body B, containing two different fluids. Let the total number of molecules of the two fluids, distributed in the
containers A and B, be d, labeled as {1, 2, ..., d}. Let the observation be made on the
number of the molecules in A. To start with, A has some number of molecules and B
has some number of molecules. In the next stage, a number 1 ≤ r ≤ d is chosen at random and the molecule labeled r is removed from the body in which it was and is placed
in the other body. This gives observation at second stage and so on. Clearly, Xn , which
denotes the number of molecules that can be in A is {0, 1, 2, ..., d} Thus, the state space
is S = {0, 1, 2, ..d}. Let us find the transition probabilities p(i, j) 0 ≤ i, j ≤ d of the
system. When i = 0,
P (0, j) = P {Xn+1 = j|Xn = 0},
i.e., A had no molecules at Xn . Therefore, clearly j can be only 1 at Xn+1 .
Thus,
P (0, j) =
0
1
if
if
j=
6 0,
j = 1.
If A has to have d molecules, (i.e., all of them) at (n + 1)th stage, then, at nth stage, it
should have only d − 1 molecules. Thus, B has one molecule and that should be chosen
and added to A. This can be done with probability 1. (Because B has only 1 molecule
and it is to be selected at random.) Thus,
1
if
j = d − 1,
P (d, j) =
0 otherwise.
For a fixed i, 1 < i < d, let us look at p(i, j), for 0 ≤ j ≤ d. Since p(i, j) is the
probability that A will have j molecules, given that it had i molecules. Now if A had
i molecules, then the only possibility for j is i − 1 or i + 1, (because the number of
molecules in A at any next stage can increase or decrease). Thus, p(i, j) = 0, if j 6= i+1
or i − 1. If j = i + 1, i.e, A has to have i + 1 molecules, then B had d − i molecules
and one of the molecules for B should be selected and added to A. The probability for
d−i
doing this is
. Thus,
d
p(i, i + 1) =
d−i
i
i
= 1 − and p(i, i − 1) = .
d
d
d
Thus, the transition matrix for this markov chain is given by
1.5. Some consequences of the markov property
0
0
1
2
3
..
.
d

0




















1
2
1
0
3
...
...
...
(1/d)
0
(1 − 1/d)
0
0
(1/d)
0
(1 − 1/d)
0
...
..
.
..
.
0
..
.
...
..
.
...
11
d
...
0
...
0
0
...
0
1/d
0
(1 − 1/d)
0
..
.
...
..
.
0
..
.
1
..
.
0





















This model is called Ehrenfest diffusion model.
1.5. Some consequences of the markov property
Let {Xn }n≥0 be a markov chain with state space S and transition probabilities (p(i, j)),
i, j ∈ S.
1.5.1 Proposition:
Let S1 , S2 , . . . S0 be subsets of S. Then for any n ≥ 1,
P {Xn = j | Xn−1 = i, Xn−2 ∈ S2 , . . . , X0 ∈ S0 } = p(i, j).
Proof:
The required property holds for elementary sets Sk = ik , for ik ∈ S by the markov
property:
P {Xn = j|Xn−1 = i, Xn−2 = in−2 , ..., X0 = i0 } = P {Xn = j|Xn−1 = i}.
Since any subset A of S is a countable disjoint union of elementary sets and the required
property follows from the property (iv) of conditional probability as in prologue.
1.5.2 Example:
let us compute P {X3 = j|X1 = i, X0 = k}, j, k ∈ S. Using proposition 1.5.1, and
markov property, we have
P {X3 = j|X1 = i, X0 = k}
X
P {X3 = j|X2 = r, X1 = i, X0 = k}P {X2 = r|X1 = i, X0 = k}.
=
r∈S
=
X
r∈S
=
P {X3 = j|X2 = r, X1 = i}P {X2 = r|X1 = i}
P {X3 = j|X1 = i}.
In fact above example can be extended to following:
1.5.3 Theorem:
For n > ns > ns1 > ... > n1 ≥ 0,
P {Xn = j|Xns = i, Xns−1 = is−1 ..., Xn1 = i1 } = P {Xn = j|Xns = i}
Thus, for a markov chain, probability at n given past at ns > ns−1 > ... > n1 , it
depends only on the most recent past, i.e., ns .
1. Basics
12
Thus, to every markov chain, we can associate a vector, distribution of the initial
stage and a stochastic matrix whose entries give us the probabilities of moving from a
state to another at the next stage. Here is the converse:
1.5.4 Theorem:
Given a stochastic matrix P and probability vector Π0 , there exists a markov chain
{Xn }n≥1 with Π0 , as initial distribution and P as transition probability matrix.
The interested reader may refer Theorem 8.1 of Billingsel[4]
1.5.4 Exercise
Show that P {X0 = i0 |X1 = i − 1, ..., Xn = in } = P {X0 = i0 |X1 = x1 }.
Review Exercises
(1.1) Mark the following statements as True/False:
(i) A Markov system can be in several states at one time.
(ii) The (1, 3) entry in the transition matrix is the probability of going from
state 1 to state 3 in two steps.
(iii) The (6, 5) entry in the transition matrix is the probability of going from
state 6 to state 5 in one step.
(iv) The entries in each row of the transition matrix add to zero.
(v) Let {Xn }n≥1 be a sequence of independent identically distributed discrete
random variables. Then it is a markov chain.
(vi) If the state space is S = {s1 , s2 , . . . , sn }, then its transition matrix will
have order n.
(1.2) Let {ξn }n≥1 be a sequence of independent identically distributed discrete
random variables. Define
ξ0
if n = 0,
Xn =
ξ1 + ξ2 + . . . + ξn for n ≥ 1.
Show that {Xn }n≥1 is a markov chain. Sketch its transition graph and
compute the transition probabilities.
(1.3) Consider a person moving on a 4 × 4 grid. He can move only to the intersection points on the right or down, each with probability 1/2. If he starts
his walk from the top left corner and Xn , n ≥ 1 denotes his position after n steps. Show that {Xn }n≥0 } is a markov chain. Sketch its transition
graph and compute the transition probability matrix. Also find the initial
distribution vector.
(1.4) Web surfing:
Consider a person surfing the Internet, and each time he encounters a web
page, he selects one of its hyperlinks at random (but uniformly). Let Xn
denote the page where the person is after n selections (clicks). What do you
think is the state space? Find the transition probability matrix.
(1.5) Let {Xn }n≥0 be a markov chain with state space, initial probability distribution and transition matrix given by


1/3 1/3 1/3
S = {1, 2, 3}, Π0 = (1/3/1/3, 1/3), P =  1/3 1/3 1/3  .
1/3 1/3 1/3
Define
0 ifXn = 1,
1 otherwise.
Show that {Yn }n≥0 is not a markov chain. Thus, function of a markov chain
need not be a markov chain.
Yn =
Review Exercises
13
(1.6) Let {Xn }n≥0 be a markov chain with transition matrix P. Define
Yn = X2n for every n ≥ 0.
Show that {Yn }n≥0 is a markov chain with transition matrix P 2 . What
happens if Yn is defined as
Yn = Xnk for every n ≥ 0?
Chapter 2
Calculation of higher
order probabilities
2.1. Distribution of Xn and other joint
distributions
Consider a markov chain {Xn }n≥1 with initial vector Π0 , and transition probabilities
matrix P = [p(i, j)], i × j. We want to find the probability that after n steps, the system
will be in a given state, say j ∈ S? For a matrix A, its n-fold product with itself will
be denoted by An .
2.1.1 Theorem:
(i) The joint distribution of X0 , X1 , X2 , . . . , Xn , is given by
P {X0 = i0 , X1 = i1 , ..., Xn = in } = p(in−1 , in )p(in−2 , in−1 ) . . . p(i0 , i1 )Π0 (i0 ).
(ii) The distribution of Xn , P {Xn = j}, is given by the j th component of the vector
Π0 P n .
(iii) For every n, m ≥ 0,
P {Xn = j | X0 = i} = P {Xn+m = j | Xm = i} = pn (i, j),
where pn (i, j) is the ij th term of the matrix P n .
Proof:
(i) Using the chain rule for conditional probability,
P {X0 = i0 , X1 = i1 , ..., Xn = in }
= P {Xn = in |Xn−1 = in−1 }P {Xn−1 = in−1 |Xn−2 = in−2 , . . . , X0 = i0 }
. . . P {X1 = i1 |X0 = i0 }P {X0 = i0 }
= P {Xn = in |Xn−1 = in−1 } P {Xn−1 = in−1 |Xn−2 = in−2 }, . . . ,
= p(in−1 , in )p(in−2 , in−1 ) . . . p(i0 , i1 )Π0 (i0 ).
. . . P {X1 = i1 |X0 = i0 }P {X0 = i0 }
15
2. Calculation of higher order probabilities
16
(ii) Let Y be a random variable with values in S and distribution P {Y = i} = λi , i ∈ S.
Then using the chain rule for conditional probability,
X
P {Xn = j} =
P {Y = i0 , Xn = j}
i0 ∈S
=
X X
i0 ∈S i1 ∈S
=
X X
i0 ∈S i1 ∈S
=
X X
i0 ∈S i1 ∈S
···
···
···
X
in−1 ∈S
P {Y = i0 , Xi1 = i1 , . . . , Xin−1 = in−1 , Xn = j}
X
P {Y = i0 } P {Xi1 = i1 |Xi1 −1 = i1 − 1} . . .
X
λi p(i0 , i1 ) · · · p(in−1 .j)
in−1 ∈S
in−1 ∈S
. . . , P {Xn = j|Xin−1 = in−1 } (2.1)
Thus for Y = X0 , we have
X
X X
···
P {Xn = j} =
in−1 ∈S
i0 ∈S i1 ∈S
=
(2.2)
Π0 (i) p(i0 , i1 ) · · · p(in−1 , j)
j th element of the vector Π0 P n .
(iii) Once again, using the markov property and the chain rule for conditional probability,
P {Xn+m = j | Xm = i} P {Xm = i}
= P {Xn+m = j, Xm = i}
X X
X
=
···
P {Xm = im , Xm+1 = im+1 ,
im ∈S im+1 ∈S
im+n−1 ∈S
. . . , Xin+m−1 = in+m−1 , Xn+m = j}
=
X
X
X
X
im ∈S im+1 ∈S
=
im ∈S im+1 ∈S
Thus
···
···
X
im+n−1 ∈S
X
P {Xm = i} P {Xm+1 = im+1 |Xm = i} . . .
. . . , P {Xin+m−1 = in+m−1 |, Xn+m = j}
im+n−1 ∈S
P {Xm = i} p(i, im+1 ) . . . p(in+m−1 , j).
P {Xn = j | X0 = i} = P {Xn+m = j | Xm = i} = pn (i, j),
where p (i, j) is the ij th term of the matrix P n .
n
2.1.2 Definition:
Let {Xn }n≥1 be a markov chain with initial vector Π0 , and transition probabilities
matrix P = (p(i, j)), i, j ∈ S.
(i) For n ≥ 1, and j ∈ S, pn (j) = P {Xn = j} is called the distribution of Xn .
(ii) For n ≥ 1, pn (i, j) is called the nth stage transition probabilities.
Above theorem gives us the probability of the system in a state at the nth stage
and the probability of the event that the system will move in n stages from a state i
to a state j. And these can be computed if we know the initial distribution and powers
of the transition matrix. Thus, it is important to compute the matrix P n , P being
the transition matrix. For large n, this is difficult to compute. Let us look at some
examples.
2.1.3 Exercise:
Show that the joint distribution of Xm , Xm+1 , . . . , Xm+n is given by
p(in−1 , in )p(in−2 , in−1 ) · · · p(im+1 , im+2 )P {Xm+1 = im+1 }
2.1. Distribution of Xn and other joint distributions
17
Also write the joint distribution of any finite Xn1 , Xn2 , ..., Xnr . for n1 < n2 , ... < nr .
2.1.4 Example:
Consider a markov chain {Xn }n≥1 with the special situation where all the Xn′ s are
independent. Let us compute P n , where P is the transition probability matrix. Because
Xn′ s are independent,
p(i, j) = P {Xn+1 = j|Xn = i} = P {Xn+1 = j}
for all j, i and for all n. Thus, each row of P is identical. By theorem 2.1.1(iii), for all
i,
pn (i, j)
=
=
=
P {Xn+m = j|Xm = i}
P {Xn = j|X0 = i}
P {Xn = j} = p(i, j).
Therefore each P n (i, j) = p(i, j), i.e., P n = P.
2.1.5 Example:
Let us consider the markov chain with two states S = {0, 1} and transition matrix
P =
1−p
q
p
1−q
.
Let Π0 (0), Π0 (1) be initial distributions. The knowledge of P and Π0 (0), Π0 (1) helps
us to answer various questions. For example, to compute the distribution of Xn , using
the formula of conditional probability: P (A|B) P (B) = P (A ∩ B)), we have for every
n ≥ 0,
P {Xn+1 = 0}
=
=
=
=
=
P {Xn+1 = 0, Xn = 0} + P {Xn+1 = 0, Xn = 1}
P {Xn+1 = 0|Xn = 0} P {Xn = 0}
+P {Xn+1 = 0|Xn = 1} P {Xn = 1}
(1 − p)P {Xn = 0} + qP {Xn = 1}
(1 − p)P {Xn = 0} + q(1 − P {Xn = 0})
(1 − p − q)P {Xn = 0} + q.
Thus, for n = 0, 1, 2, ...,
P {X1 = 0}
P {X2 = 0}
=
=
=
(1 − p − q)Π0 (0) + q
(1 − p − q)P {X1 = 0} + q
(1 − p − q)[q + (1 − p − q)Π0 (0)] + q
=
(1 − p − q)2 Π0 (0) + q(1 − p − q) + q.
=
(1 − p − q)2 Π0 (0) + q
··· ···
···
··· ···
P {Xn = 0}
=
(1 − p − q)n + q
n−1
X
j=0
1
X
j=0
(1 − p − q)j .
(1 − p − q)j .
2. Calculation of higher order probabilities
18
P n (0, 0)
P {Xn = 0|X0 = 0}
=
P {Xn = 0}
q
q
+ (1 − p − q)n 1 −
p+q
p+q
p
q
+ (1 − p − q)n
.
p+q
p+q
=
=
=
Then, using the fact that P {X0 = 0} = 1,
P n (0, 1)
P {Xn = 1|X0 = 0}
=
P {Xn = 1}
p
p
+ (1 − p − q)n 0 −
p+q
p+q
p
p
− (1 − p − q)n
.
p+q
p+q
=
=
=
Then,
P n (1, 0)
=
=
=
=
=
P {Xn = 0|X0 = 1}
P {Xn = 0}
q
q
+ (1 − p − q)n Π0 (0) −
p+q
p+q
q
q
+ (1 − p − q)n 0 −
p+q
p+q
q
q
n
− (1 − p − q)
.
p+q
p+q
And
P n (1, 1) =
Therefore,
Pn =
1
p+q
1
p+q
+ (1 − p − q)n
q
q
p
p
+
(1 − p − q)n
p+q
q
p+q
p
−q
−p
q
.
2.1.6 Exercise:
Consider the (random walk) markov chain as in example 1.1.10.
(i) If p = q = 0, what can be said about the machine?
(ii) If p, q > 0, show that
P {Xn = 0} =
q
q
+ (1 − p − q)n Π0 (0) −
p+q
p+q
P {Xn = 1} =
p
p
+ (1 − p − q)n Π0 (1) −
.
p+q
p+q
and
(iv) Find conditions on Π0 (0) and Π0 (1) such that distribution of Xn is independent
of n.
(v) Compute the following:
P {X0 = 0, X1 = 1, X2 = 0}
(vi) Can one compute joint distribution of Xn+2 , Xn+1 , Xn ?
2.1. Distribution of Xn and other joint distributions
19
2.1.7 Note (In case P is diagonalizable:
As we observed earlier, it is not easy to compute P n for a matrix P, even when it
is finite. However, in the case P is diagonalized (see Appendix for more details), it
is easy: let there exist an invertible matrix U such that U P U −1 = D, where D is
a diagonal matrix. Then P n = U Dn U −1 , and Dn is easy to compute.In this case,
we can compute the elements of P n . Let the state space has M elements and P be
diagonalizable with diagonal elements of D be λ1 , λ2 , . . . , λM , these are the eigenvalues
of P. To find pn (i, j) :
(i) Compute the eigenvalues λ1 , λ2 , . . . , λM , of P by solving the characteristic equation.
(ii) If all the eigenvalues are distinct, then for all n, pn
ij has the form
n
n
pn
ij = ai λ1 + . . . + aM λM ,
for some constants ai , . . . , aM , depending upon i and j. These can be found by solving
system of linear equations.
2.1.8 Example:
Let for a markov chain, the transition matrix is


0
1
0
1/2 1/2  ,
P = 0
1/2
0
1/2
and let us try to find a general formula for pn
11 . We first compute the eigenvalues of P
by solving
0−λ
det(P − λI) = 0
1/2
1
1/2 − λ
0
0
1/2
1/2 − λ
= 0.
This gives (complex) eigenvalues 1, ±(i/2). Thus, for some invertible matrix U,

0
i/2
0
1
=U 0
0
0
(i/2)n
0
1
P =U 0
0
and hence
Pn


0
0  U −1 ,
−i/2

0
 U −1 .
0
(−i/2)n
In fact U can be explicitly written in terms of the eigenvectors. In another way, above
equation implies that for scalars a, b, c,
n
n
pn
11 = a + b(i/2) + c(−i/2) .
In order to have real solutions, we compare the real and imaginary parts of the above
and have for all n ≥ 0,
n
n
pn
11 = a + b(i/2) cos(nπ/2) + c(i/2) sin(npi/2).
In particular for n = 0, 1, 2, we have
1
=
p0 11 = a + b
0
=
p1 11 = a + 1/2c
0
=
p2 11 = a − 1/4b.
2. Calculation of higher order probabilities
20
A solution of the above system is given by a = 1/5, b = 4/5, c = −2/5, and hence
n
n
pn
11 = 1/5 + (1/2) (4/5 cos(nπ/2) − 2/5(i/2) sin(nπ/2).
2.2. Kolmogorov-Chapman equation
We saw that given a markov chain {Xn }n≥1 with state space S, initial distribution
Π0 and transition matrix P, we can calculate the distribution of Xn and other joint
distributions. Thus, if we write Πn for the distribution of Xn , i.e., if Πn (j) = P {Xn =
j}, then,
X
Π0 (k)pn
Πn (j) =
kj .
k∈s
or symbolically,
Πn = Π0 P n .
Now we can write the joint distribution of Xn+1 , ...Xm+n as
P {Xm+t = it , 0 ≤ t ≤ n} = Πm+1 (i1 )pi1 , i2 , ...pin+1 , in .
Entries of P n are called the nth step transition probabilities. Thus, the knowledge
about the markov chain is contained in Π0 and the matrix P. As noted earlier P is a
matrix (may be an infinite) such that sum of each row is 1, i.e., a stochastic matrix.
For consistency, we define P 0 = Id. The following is easy to show:
2.2.1 Theorem:
For n, m ≥ 0 and (i, j ∈ S,
pn+m (i, j) =
X
pn (i, r) pm (r, j),
r∈S
In matrix multiplication this is just
P n+m = P n P m .
This is called the Kolmogorov Chapman equation.
Proof:
Using the property (v) conditional probability
pn+m (i, j)
=
=
P {Xn+m = j | X0 = i}
X
P {Xn = r, | X0 = i} P {Xn+m=j | Xn = r, X0 = i}
r∈S
=
X
r∈S
=
X
pn (i, r) P {Xn+m = j | Xn = r, X0 = i}
pn (i, r)pm (r, j),
r∈S
The last equality follows from the fact that
P {Xn+m = j | Xn = r, X0 = i} = P {Xn+m = j | Xn = r} = pm (r, j),
as observed in theorem 1.5.5.
2.2.2 Example:
Consider the unrestricted random walk on the line, as in example 1.2.1, with probability
p to move forward and 1 − p to come back. Then,
p2n+1 (0, 0) = 0.
Exercises
21
as only in even steps it can come back to starting point. And,
2n
pn (1 − p)2n−n ,
p2n (0, 0) =
n
as there will be n moves to right and n back. Thus,
2n
(pq)n .
p2n (0, 0) =
n
In fact,this is true for every diagonal entry. Other entries are difficult to compute.
Note that
∞
X
2n
∞
(pq)n .
pin
=
Σ
00
n=0
n
n=1
Using sterling’s approximation,
n!
we have
√
2πnn+1/2 e−n ,
(pq)n 2n
2n
P00
= Σ∞
n=0 √
nπ
which is convergent if pq < 1, and divergent otherwise. Thus, 0 is transient if p 6= q,
and recurrent if p = q = 1/2.
2.2.3 Example:
Consider the markov chain of exercise 1.3, with state space S = {1, 2, 3, 4}, initial
distribution (1, 0, 0, 0), and transition matrix


0
1/2
0
1/2
 1/2
0
1/2
0 
.
P =
 0
1/2
0
1/2 
1/2
0
1/2
0
Then,

1/2
 0
2
P =
 1/2
0
and
Π0 P 2 =
1
0
0

1/2
 0

0 
1/2
0
0
1/2
0
1/2
0
1/2
0
1/2
1/2
0
1/2
0
1/2
0
1/2
0

0
1/2 

0 
1/2

0
1/2 
=
0 
1/2
0
1/2
0
1/2
.
Thus, if we want to find the probability that the walker will be in state 3 in two steps,
then it is
Π2 (3) = (Π0 P 2 )(3) = 0.
Exercises
(2.1) Consider the markov chain of example 2.2.3. Show that

 (0, 1/2, 0, 1/2) for n=1, 3, 5, . . .
Πn =

(1/2,0,1/2,0)
for n= 2, 4, 6, . . .
2. Calculation of higher order probabilities
22
(2.2) Let {Xn }n≥0 be a markov chain with state space, initial probability distribution and transition matrix given by
3/4 3/4
.
S = {1, 2}, Π0 = (1, 0), P =
1/4 1/4
Show that
Πn =
1
1
(1 + 2−n ), (1 + 2−n )
2
2
for every n.
(2.3) Consider the two state markov chain {Xn }n≥0 with Π0 = (1, 0), and transition matrix
1−p
p
.
P =
q
1−q
Using the the facts that P is stochastic and the relation P n+1 = P n P, deuce
that
p( n + 1)(1, 1)
=
P n (1, 1) + pn (1, 2)
=
pn (1, 2)q + pn (1, 1)(1 − p)
1,
and hence,for all n > 0,
p( n + 1)(1, 1) = pn (1, 1)(1 − p − q) + q.
Show that this has a unique solution

p
q
n

 p + q + p + q (1 − p − q)
n
p (1, 1) =


1
for p + q > 0
for p + q < 0.
Chapter 3
Classification of states
Let {Xn }n≥0 be a Markov chain with state space S, initial distribution Π0 and transition probability matrix P. We will denote the ij th element of pn (i, j) also by pn
ij . We
start looking at the possibility of moving from one state to another.
3.1. Closed subsets and irreducible subsets
3.1.1 Definition
(i) We say a state j is reachable from a state i (or i leads to j or j is approachable
from i,) if there exists some n ≥ 0, such that pn
ij > 0. We denote this by i → j.
In other words, i leads to j in n steps with positive probability.
(ii) A subset C of the state space is said to be closed if no state from C leads to a
state outside C.
Thus C is closed is same as for every i ∈ C, j ∈
/ C pn
ij = 0 ∀n ≥ 0. This means once
the chain enters the set C it will never leave it.
(iii) A state j is called an absorbing state if the singleton set {j} is a closed set.
3.1.2 Proposition:
(i) If i → j and j → k, then i → k.
(ii) A state j is reachable from a state i iff pii1 pi1 i2 . . . pin−1 j > 0, for some i1 , i2 , . . . , in−1 ∈
S.
(iii) C ⊂ S is closed iff ∀i ∈ C, j ∈
/ C, pij = 0.
(iv) The state space S is closed and for i ∈ S, the set {i} is closed if pii = 1.
Proof:
(i) Follows from the fact that
X n m
m
pir prk > pn
pn+m
=
1j pjk > 0 for some n, m > 0.
ik
r∈S
(ii) Follows from the equality
pn
ij =
X
pii1 pi1 i2 . . . pin−1 j .
i1 ,...in−1
23
3. Classification of states
24
(iii) Clearly, pn
/ C, pij =
ij = 0 ∀ n implies that pij = 0. Conversely, let for all i ∈ C, j ∈
0. Then plk = 0 for l ∈ C, k ∈
/ C, and pk,l = 0 for l ∈
/ C, r ∈ C. Thus, for all r ∈ C
and k ∈
/ C,
X
X
prl plk = 0.
prl plk =
p2rk =
l∈S
l∈S
/
Proceeding similarly, pn
rk = 0 for all n ≥ 1.
(iv) Proof of (iv) is obvious.
3.1.3 Definition:
A subset C of S is called irreducible if any two states in C lead to one another.
Let us look at some examples.
3.1.4 Example:
Consider a markov chain with transition matrix:

0 


1 



2 



3 



4 



5
0
1
2
3
4
5
1
0
0
0
0
0
1/4
1/2
1/4
0
0
0
0
1/5
2/5
1/5
0
1/5
0
0
0
1/6
1/3
1/2
0
0
0
1/2
0
1/2
0
0
0
1/4
0
3/4










.









We first look at which state leads to which state. Whenever, i → j, we put a ∗ in the
matrix entry. Note, pij > 0 will give a ∗ at ij th entry, but pij = 0 need not give 0 in
the matrix. For example, p13 = 0, but 1 → 2 → 3, so p13 is replaced by ∗. For the
above matrix, we have
0
1
2
3
4
5
 0
∗
 ∗

 ∗

 0

 0
0
1
0
∗
∗
0
0
0
2
0
∗
∗
0
0
0
3
0
∗
∗
∗
∗
∗
4
0
∗
∗
∗
∗
∗
5
0
∗
∗
∗
∗
∗








1→
2→
3→
4→
5→
2→
1→
4,
3→
3→
3→
0,
3→
4,
4,
4 →
2→
5,
4→
5→
5
3→
3→
5
5
4→
3
5
Clearly, every single state i is a closed set if pii = 1. For example in our case, {0}
is a closed . The set S is closed by definition for there is no state outside S. Thus,
{0, 1, 2, 3, 4, 5} is closed. A look at the matrix of communication tells us that the set
{3, 4, 5} is closed because none of 3, 4, 5, lead to 0, 1, 2. For example {1} is not closed
because 1 → 2. In fact, there is no other closed sets. The set {3, 4, 5} is also irreducible.
3.1. Closed subsets and irreducible subsets
25
3.1.5 Note (importance of closed irreducible sets):
Why one should bother about closed subsets of the state space? To find the answer, let
us look at the above example again. Let us take a proper closed set, say C = {3, 4, 5}.
Now if we remove the rows and columns corresponding to states 1 and 2 from the
transition matrix, we get the sub-matrix

3 
4
5
3
4
1/6
1/2
1/4
1/3
0
0
5

1/2
1/2 
3/4
which has the property that sum of each row is 1. In fact, if we take P 2 and delete
rows and columns not in C, and write it as (P 2 )C , then it is easy to check it is nothing
by (PC )2 . For in P 2 note for i ∈ C,
Pij2 = 0 if j ∈
/ C.
Therefore,
1=
X
Pij2 =
X
p2ij .
j∈C
j∈S
Thus, (PC )2 is a stochastic matrix. Also, for i, j ∈ C.,

 (ij)th entry of PC2 .
X
X
2
pir prj =
pir prj =
pij =

0 if j ∈
/ C.
r∈C
r∈S
because C is closed, and pir = 0, for r ∈
/ C. In general, (P n )C = (PC )n . Hence, one
can consider the chain with state space C and analyze it. This reduces the number of
states.
3.1.6 Definition:
Two states i and j are said to communicate if either is accessible from the other, i.e.,
m
pn
ij > 0 and pji > 0 for some m, n ≥ 1. In this case we write i ↔ j.
3.1.7 Proposition:
(i) For i, j ∈ S, let us say i ∼ j iff i ↔ j. Then ∼ is an equivalence relation on S
(ii) Each equivalence class, called communicating class has no proper closed subsets.
Proof:
(i) That i ↔ i follows from the fact that P 0 = Id, and hence p0ii = 1. Obviously, it is
symmetric, and transitivity follows from proposition 3.1.2(i).
(ii) Let C be an equivalence class. If A is a proper subset of C, let j ∈ C \ A. Let i ∈ A.
Then i ↔ j implying that j ∈
/ A is accessible from i A. Hence, A is not closed.
3.1.8 Note:
A communicating class need not be closed. It may be possible to start from one
communicating class and enter another with positive probability. For example consider
a markov chain with transition matrix




P =



1/2
0
1/3
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
1/3
1/2
0
0
0
0
1/3
1/2
0
1
0
0
0
0
1
0




.



3. Classification of states
26
The communicating classes are {1, 2, 3}, {4}, {5, 6}. Clearly, 3 → 4, but 4 6→ 3. Only
{5, 6} is a closed subset.
3.1.9 Example:
Consider a markov chain with five states {1, 2, 3, 4, 5} and with transition matrix


..
.
0
0
0 
 1/2 1/2


..


 1/4 1/4

.
0
0
0


 ··· ··· ··· ··· ··· ··· 


P =

..
 0
0
.
0
1
0 




..


0
.
1/2
0
1/2 
 0


..
0
0
.
0
1
0
States 1 and 2 communicate with each other and with no other state. Similarly,
states 3, 4, 5 communicate among themselves only. Thus, the state space divides into
two closed irreducible sets {1, 2} and {3, 4, 5}. For the sake of all practical purposes,
analyzing the given markov chain is same as analyzing two smaller two chains with
smaller state space, with transposition matrices
P1 =
1/2
1/4
1/2
1/4

0
, P2 =  1/2
0
1
0
1

0
1/2  .
0
3.1.10 Theorem:
A set C ⊆ S is irreducible if every state in C communicates with every other state in
it.
Proof:
Suppose, C is irreducible. For j ∈ C, define
Cj = {i ∈ C | pn
ij = 0∀n ≥ 0}.
We claim that Cj is a closed set. To see this, let k ∈
/ Cj . Then there exists some m
such that pm
kj > 0. Now if i is such that pik > 0, then
X
m
pil pm
pm+1
=
lj > pik pkj > 0,
ij
l∈S
not possible if i ∈ Cj . Thus, pik = 0, for every i ∈ Cj and k ∈
/ Cj , implying that cj
is closed. In fact, C being irreducible, this implies that C = Cj , and hence any two
states in C communicate with each other. Conversely, let i ↔ j for all i, j ∈ C and
A ⊆ C be a closed set. Then, for j ∈ A and i ∈ C, since i ↔ j, we have jinC, and
hence A = C, i.e., C is irreducible.
In view of note 3.1.5, one would like to partition the state space into irreducible subsets.
Exercises
(3.1) Let the transition matrix

1/2
 1/2

 0

 1
0
of a markov chain be given by

0
0
1/2
0
0 1/3
0
1/6 

0
1
0
0 
.
0
0
0
0 
1
0
0
0
3.2. Periodic and aperiodic chains
27
Write the transition graph and find all the disjoint closed subsets of the state
space S = {1, 2, 3, 4, 5}.
(3.2) Consider the markov chain in example 1.2.2, Random walk with absorbing
barriers. Show that the state space splits into three irreducible sets. Is it
possible to go from one set to other?
(3.3) For the queuing markov chain in example in section 1.3, write the transition
matrix and if f (k) > 0 for every k, deuce that S itself is irreducible.
(3.4) Let a markov chain have transition matrix


0 1 0
P =  0 0 1 .
1 0 0
Show that it is an irreducible chain.
3.2. Periodic and aperiodic chains
Throughout this section {Xn }n≥0 will be a markov chain with state space S, initial
probability Π0 and transition matrix P.
3.2.1 Definition:
A state j is said to have period d, if pn
jj > 0 implies d divides n and d is the largest
such integer.
In other words, ’period of j is the greatest common divisor of the numbers {n ≥
1|pn
ij > 0}.
A state j has period d, means that pn
jj = 0 unless n = md for some m ≥ 1, and
d is the greatest positive integer with this property. Thus, j has period d means the
chain may come back to j at time points md only. But, it may never come back to the
state j.
3.2.2 Example:
Consider a markov chain with transition matrix
1
2
P =
3
4

1
0


 1/2



 0



 0
2
1
0
0
0
3
0
4 
0


1/2 0 


.
0
1 



1
0 
Now pjj = 0 ∀ j. Therefore, period of each state is > 1. In fact, each state has period
(odd)
2 for p2jj > 0 and pjj
= 0. But {3, 4} form a closed set and once a particle goes to
the set {3, 4} (say from state 2,) it will never come out and return to 2.
3.2.3 Definition:
A state j is called aperiodic state if j has period 1. The chain is called aperiodic
chain if every state in the chain has period 1.
In an aperiodic chain, if i ↔ j, then pn
ij > 0 for all sufficiently large n, i.e., it is
possible for the chain to come back to any state at any time.
3. Classification of states
28
3.2.4 Example:
Consider the transition graph of a markov chain with transition graph
Note that the starting in state 1, it can be revisited at stages 4, 6, 10, 8, ... Thus the
state 1 has period 2.
3.2.5 Example (Birth and death chain):
Consider a markov chain on S = {0, 1, 2, . . .}. Starting at i the chain can stay at i or
move to i − 1 or i + 1 with probabilities

qi



ri
p(i, j) =
pi



0
if j = i − 1
j = x,
j = i + 1,
otherwise.
Saying that that it is an irreducible chain is same as saying that pi > 0 for all i ≥ 0,
and qi > 0 for all i > 0. It will be aperiodic if ri > 0, see exercise (3.5) below. If
ri = 0 for all i, then the chain can return to i only after even number of steps. Thus
the period of the chain can only be a multiple of 2. Since p200 = p0 q1 > 0, every state
has period 2.
3.2.6 Theorem:
If two states communicate with each other, then they have same periods.
Proof:
Let di = period of i and dj = period j. It is enough to show that di divides r if prjj > 0.
n
i ↔ j implies there exist n, m such that pm
ij > 0 and pji > 0. By Kolmogorov-Chapman
equations, for every r ≥ 0,
m+r+n
r n
pii
≥ pm
ij pjj pji > 0.
This implies di divides m+r+n for every r ≥ 0, with prjj > 0, In particular, with r = 0,
as p0jj > 1 implies that di divides m + n, and hence di divides r = (m + r + n) − (m + n).
Hence, di ≥ dj . Similarly,di ≤ dj .
Exercises
(3.5) Show that if a markov chain is irreducible and pii > 0 for some state i, then
it is aperiodic.
(3.6) Show that the queuing chain of example 1.3 is aperiodic.
3.3. Visiting a state: transient and recurrent states
29
3.3. Visiting a state: transient and recurrent
states
Let i, j ∈ S be fixed. Let us consider the probability of the event that for some n ≥ 1,
the system will visit the state j given that it starts in the state i. Let
n
fij
:= P {Xn = j, Xk 6= j, 1 ≤ k ≤ n − 1|X0 = i} , n ≥ 1,
n
i.e., fij
is the probability of first visit to state j starting at i in n steps. We are
interested in computing
fij :=
∞
X
n
fij
,
n−1
in terms of the transition probabilities. Let us first compute fiin for any n. We define
fii0 = 0 for all i. It is the probability of eventual visit to state j starting from state i.
Note that, fii1 = pii and fij is the probability that the system has a visit to j starting
at i in some finite time.
3.3.1 Proposition:
1
(i) fij
= pij .
X
n
n+1
pir frj
.
(ii) fij
=
r6=j
n
X
(iii) pn
ij =
k
pn−k
jj fij .
k=0
(iv) pn
ii =
n
X
pii n−k fii k .
k=1
(v) P {system visits state j at least 2 times |X0 = i} = fij fjj .
More generally,
(m−1)
P {system has m visits and at least to state j |X0 = i} = fij fjj
.
Proof:
(i) Obvious.
(ii)
n+1
fij
=
X
r6=j
=
X
P {from i to r in one step} P {first visit in nth step from r to j}
n
pir frj
.
r6=j
(iii) Note that
pn
ij
=
=
n
X
m=1
n
X
P {first visit to j at mth step |X0 = i.}P {Xn = j|Xm = j}
m n−m
fij
pjj .
m=1
(iv) Follows from (iii)
3. Classification of states
30
(v)
P {system visits state j at least2 times |X0 = i}
XX
P {system has first visit toj at k|X0 = i} ×
=
n
k
XX
=
n
k n
fij
fjj =
X
k
fij
n
k
!
X
k
P {system has first visit at n + k|Xk = j}
!
n
fjj
= fij fjj .
In the general case, similarly,
(m−1)
P {system has m visits and at least to state j |X0 = i} = fij fjj
.
.
3.3.2 Definition:
(i) A state i is called recurrent if fii = 1, i.e., with probability 1, the system comes
back to i.
(ii) A state i is called transient if fii < 1. Thus, the probability that the system
starting at i does not come back to j, i.e., (1 − fii ), is positive.
3.3.3 Theorem:
(i) The
(a)
(b)
(c)
following statements are equivalent for a state j:
The state is transient.
P
{system visits to j infinite number of times |X0 = i} = 0.
X
pn
jj < ∞.
n
(ii) The following statements are equivalent for a state j:
(a) The state is recurrent.
(b) P
{system visits to j infinite number of times |X0 = i} = 1.
X
pn
(c)
jj = ∞.
n
Proof:
(i) Using (v) of theorem 3.3.1, we have
P {system visits to j infinite number of times |X0 = i}
=
=
=
limm→∞ P {system has at least m visits to statej|X0 = i}.
limm→∞ (fij fj j (m−1) )
fij limm→∞ (fjj )(m−1) .
Hence,
P {system visits to j infinite number of times |X0 = i} = 0 iff fjj < 1.
This shows that (b) holds iff (a) holds.
X n
pjj < ∞. Then by Borel-Cantelli lemma, (b)
Next suppose (c) holds, i.e.,
holds.
n
3.3. Visiting a state: transient and recurrent states
31
Conversely, let (a) holds, i.e., fjj < 1. We shall show (c) holds. Using 3.3.2(ii),
we have
n
X
ptjj
=
t−1
n X
X
t=1 s=0
t=1
=
n−1
X
psjj
s=0
Thus, (1 − fjj )
(t−s) s
pjj
fjj
Pn
t=1
n
X
(t−s)
fjj
t=s+1
≤ fjj +
n
X
psjj fjj .
s=1
ptjj ≤ fjj . Thus, for every n ≥ 1
n
X
t=1
ptjj ≤
fjj
,
1 − fjj
implying (c) as fjj < 1. This completely proves (i).
Proof of (ii) follows from (i).
3.3.4 Example:
Consider the unrestricted random walk on the integers with probability p moving to
right, probability q moving to left, and p + q = 1. It is clearly an irreducible chain.
Starting at 0 one can come back to 0 only in even number of steps. Thus,
p00 2n+1 = 0, and p2n
00 = {X2n = 0 |X0 = 0}.
Starting from 0 if it has to come back to 0 in 2n steps, then it can go to left in n steps
and right by n steps. Thus,
2n
pn q n .
p2n
=
00
n
Therefore,
∞
X
p2n
00 =
∞
X
p2n
00 =
n=0
m=0
∞ X
2n
pn q n .
n
n=0
To decide whether the state 0 is transient or not, one has to know whether this series
is convergent or not. Note that,
2n!
2n
=
,
n
n!.n!
√
and by sterling’s formula, n! ∼ ( 2π)nn+1/2 , we have
(2n)2n+1/2
2n
∼
n+1/2
n
n
.nn+1/2
=
22n .2!
n2n+1/2−2n−1
22n
√
= √
nπ
2π
Hence,
(4pq)n
.
p2n
00 ∼ √
nπ
Since p(1 − p) = pq ≤ 1/4 and equality holds iff p = q = 1/2. Thus, for θ = 4pq,

∞
X
θn


√
, θ < 1 if p 6= q 6= 1/2.



n
∞

0
X
2n
p00 ∼

∞

X
n=0

1


√
if p = q = 1/2.

n
0
3. Classification of states
32
∞
X
θn
√ < +∞ and is divergent if θ = 1. Thus, 0 is a
n
0
recurrent state iff p = q = 1/2. In fact same holds ∀ state j. If p 6= q, then intuitively
particle will drift to −∞ or +∞. as 0 is the transit state and so in every state.
One knows that for θ < 1,
3.3.5 Theorem:
Let i → j and i be recurrent. Then,
(i) fji = 1, j → i and j is recurrent.
(ii) fij = 1
Proof:
(i) Since i → j, there exists n ≥ 1 such that pn
ij > 0. Let n0 be the smallest positive
n0
m
0
integer such that pn
ij > 0. Then, pij = 0 for 1 ≤ m < n. Since pij > 0, there exists
states i1 , i2 , ...in0 −1 , none equal to j such that
P {Xn0 = j, Xn0 −1 = in0 −1 , ...X1 = i1 |X0 = i}
>
0.
(3.1)
Suppose fji < 1. Then (1 − fji ) > 0, i.e.,
P {system starts at j but never visits i}
>
0.
(3.2)
Therefore,
α:
=
=
P {X1 = i1 , ..., Xn0 −1 = in0 −1 , Xn0 = j,
Xn 6= i for n > n0 |X0 = i}
P {Xn 6= i for n ≥ n0 + 1|Xn0 = j, Xn0 −1 = in0 −1 , . . .
..., X0 = i} × P {Xn0 = j, Xn0 −1 = in0 −1 , . . .
=
>
..., Xi1 = i1 |X0 = i}
P {Xn 6= i for n ≥ n0 + 1|Xn0 = j} ×
0,
P {Xn0 = j, ..., X1 = i1 |x0 = i},
using equations (3.1) and (3.2). Thus
P {Xn 6= i for every n|X0 = i} > α > 0 for all n,
i.e., the system starts at i and never comes back to i, i.e., i cannot be a recurrent
state. Hence, if i is recurrent then our assumption that fji < 1 is not true.
Thus, i recurrent implies fji = 1. But then,
fji =
X
m
fji
= 1,
m≥1
m
and hence for some m, fji
> 0, i.e., with positive probability there is a first visit to
m
i starting from j. Hence pm
ji ≥ fji > 0, i.e., j → i. Thus, we have shown i → j and i
recurrent implies fji = 1. and hence j → i. Further,
m+n+n0
pjj
=
X
r,k
n n0
m n n0
pm
jr prk pkj . > pji pik pij .
3.3. Visiting a state: transient and recurrent states
Using this,
X
n≥1
pn
jj
≥
=
X
33
pn
jj .
n=m+1+n0
X
m+n+n0
pjj
n≥1
>
X
n n0
pm
ji pii pij
=
pm
ji
n≥1
because
n0
pm
ji > 0, pij > 0, and
Thus, j is recurrent, proving (i).


X
X
n≥0

0
 pn
pn
ii
ij
= ∞,
pn
ii = +∞.
(ii) Apply (i) to i and j, interchange.
3.3.6 Corollary:
If i → j and j → i, then, either both are transient or both are recurrent.
Proof:
If i is recurrent, and i → j then, j is recurrent by above theorem. Let i be transient
and j be recurrent. But as j → i, and hence by above theorem i is recurrent, not
possible. Hence, i transient implies j transient.
3.3.7 Corollary:
Let C ⊂ S be an irreducible set. Then, either all states in C are recurrent or all are
transient. Further, if C is a communicating class and all its states are recurrent, then
C is closed.
Proof:
Since all states in C communicate with each other,by corollary 3.3.6, all states in C
are either transient or recurrent. Next suppose C is a communicating class and j ∈
/ C.
Let i → j for some i ∈ C. Then by above theorem above, j → i, and hence j ∈ C, not
true. Hence C is closed.
Hence we know how to characterize irreducible markov chains.
3.3.8 Exercise:
Show that if a state j is transient, then
∞
X
n=1
pn
ij < ∞ for all i.
3.3.9 Theorem:
Let {Xn }n≥1 be an irreducible markov chain with state space S and transition probability matrix P.
(i) Either all states are transient in which case
X n
pij < +∞ for all i, j.
n≥0
and
P {Xn = j infinite n′ s|X0 = i} = 0
(ii) All states are recurrent in which case
X n
pii = +∞ for all i.
n≥0
3. Classification of states
34
3.3.10 Corollary:
If S is finite then it has at least one recurrent state.
Proof:
Suppose all states are transient. Then,
X
pn
ij < +∞ for all i, j.
n≥0
Thus, limn→∞ pn
ij = 0. Hence, as S is finite and P is a stochastic matrix,
0 = limn→∞
X
pn
ij = 1
j∈S
a contradiction.
3.3.11 Corollary:
In a finite irreducible chain, all states are recurrent.
3.3.12 Examples:
The two states markov chain with transition matrix
1−p
p
q
1−q
is irreducible, finite and hence all states are recurrent.
3.3.13 Example :
Consider the chain discussed in example 3.1.3 with transition matrix
0
1
2
3
4
5








0
1
1/4
0
0
0
0
1
0
1/2
1/5
0
0
0
2
0
1/4
2/5
0
0
0
3
0
0
1/5
1/6
1/2
1/4
4
5
0
0
0
0 

0
0 
.
1/3 1/2 

0
1/2 
0
3/4
Let us find its transient, recurrent states.
(i) 0 is an absorbing state as p00 = 1 and hence is recurrent.
(ii) As observed earlier {3, 4, 5} is a finite ,closed, irreducible set, hence by corollary 3.3.11, all states are recurrent.
(iii) Now if 2 was a recurrent state, since 2 → 0, and by theorem 3.3.5, we should
have 0 → 2, but that is not true. Hence 2 is not recurrent and hence must
be transient. Similarly, 1 is transient.
Thus we can write the state space as S = {1, 2} ∪ {3, 4, 5}, where first set consists of
transient states and second is irreducible set of recurrent states.
3.3. Visiting a state: transient and recurrent states
35
3.3.14 Example :
Let us find transient/recurrent state for chains with transition matrices:
0
P =

0
 1/2
1/2
1 2 3 
1/2 1/2
Q=
0
1/2  ,
1/2
0


R= 



1
1/2
1/2
0
0
1/4
2
1/2
1/2
0
0
1/4
3
0
0
1/2
1/2
0
 1
0
 1

 0
0
4
0
0
1/2
1/2
0
2
3 4 
0 1/2 1/2
0
0
0 
,
1
0
0 
1
0
0
5
0
0
0
0
1/2






Chain with transition matrix P is finite irreducible and thus recurrent and finite. The
chain with transition matrix Q is also finite irreducible and hence recurrent. For the
chain with transition matrix R, {1, 2} and {3, 4} are irreducible sets and hence are
recurrent. Since, 5 → 1 but 1 6→ 5 so 5 cannot be recurrent. Therefore, 5 is transient.
Once again, we have the decomposition S = {5}∪{1, 2}∪{3, 4}, where first set consists
of transient state and second and third sets are irreducible sets of recurrent states.
We had saw in above example, that the state space S could be written as ST ∪C, ∪...
When ST consists of all transient states, C1 , C2 , ... are closed irreducible sets containing
of recurrent states. We show this is possible in general.
3.3.15 Proposition:
For every recurrent state i there exists a closed subset C(i) such that the following
holds:
(i) Each C(i) 6= ∅, is closed and irreducible.
(ii) Either C(i1 ) ∩ C(i2 ) = ∅ or C(i1 ) = C(i2 ).
(iii) ∪i C(i) = SR , set of all recurrent states
Proof:
For i ∈ SR , define
C(i) = {j ∈ S|i → j}.
We prove that the sets C(i) has the required properties.
(i) i ∈ C(i) for p0ii = 1 and hence C(i) 6= ∅.
If j ∈ C(i) then j is recurrent and j → i. Hence i ↔ j.
Thus, any two states in C communicate with each other, i.e., C is irreducible. If
k∈
/ C(i), then i 6→ k, for otherwise k → i implying k ∈ C.
Also for j ∈
/ C, i ↔ j and hence j 6→ k for if j → k then i → k. Therefore, C(i) is
closed.
(ii) If i ∈ C(i1 ) ∩ C(i2 ), then for j ∈ C(i1 ),
implying C(i1 ) ⊆ C(i2 ).
Similarly, C(i2 ) ⊆ C(i1 ).
(iii) is obvious.
j ↔ i1 ↔ i ↔ i2
3. Classification of states
36
3.3.16 Theorem (Decomposition of state space):
S = ST ∪ SR , where ST consists of all transient states, SR consists of all recurrent
states, such that SR = C1 ∪ C2 ∪ ..., consisting of closed irreducible disjoint sets Ci .
Proof:
Clearly S = ST ∪ SR is possible by definition. Required decomposition follows from
proposition 3.3.15.
3.3.17 Note:
Thus, we can write the state space as
S = ST ∪ C1 ∪ C2 ∪ ...
where ST consists of transient states, each Ci is irreducible recurrent. On each Ci
chain can be analyzed (irreducible.) If ST is also irreducible closed, we can analyze
chain on it separately also. In general, locating a recurrent state in a chain may not
be easy.
3.3.18 Some questions:
(i) If chain starts in ST , what is the probability that it will stay always in ST ?
(ii) Given i ∈ S what is the probability that it will hit a closed irreducible set C of
recurrent states and stay in it for ever? Clearly, this probability is

if i ∈ C.
 1
pc (i) =

0 if i ∈
/ C but i is recurrent.
So case of interest is for i ∈ ST , what is pc (i)? (iii) Can we have an alternative
criterion for a state to be transient or recurrent ?
We shall answer some of these in the next section.
3.4. Absorption probability
Let C be an irreducible closed set of recurrent states. For i ∈ S,

P {system hits C eventually|x0 = i}



[
pc (i) =

P {Xn ∈ c |x0 = i}


n≥0
Note that if i ∈ C, then, pc (i) = 1. If i ∈
/ C, but i is recurrent, then pc (i) = 0. So, the
problem is to compute pc (i) when i is in ST ? The answer is given by
3.4.1 Theorem: pc (i) satisfy the system of equations for i ∈ ST to go from i to C,
we can go either from i to j ∈ C in one step or i to j ∈ ST in one step and then from
j to C.
X
X
pc (i) =
pij +
pij pc (j).
j∈C
j∈ST
Thus, to find pc (i) one has to solve these equations.When ST is infinite, this is not
known how to solve these equations. Moreover, solutions need not be unique in that
case. When ST is finite , one can show a unique solution exists. We give an example
to illustrate this.
3.4. Absorption probability
37
3.4.2 Example
Let
P =
0
1
2
3
4
5
 0
1
 1/4

 0

 0

 0
0
1
0
1/2
1/5
0
0
0
2
0
1/4
2/5
0
0
0
3
0
0
1/5
1/6
1/2
1/4
4
0
0
0
1/3
0
0
5
0
0
1/5
1/2
1/2
3/4




.



Then as observed in example 3.3.13, C = {0} is a closed irreducible set and ST = {1, 2}.
Let us find pc (1), pc (2) for C = {0}. We have to solve
X
X
pij pc (j)
pij +
pc (1) =
j∈c
j∈ST
=
pi 0 + p1 1 pc (1) + p1 2 pc (2).
1
1
pc (1) =
+ pc (1) + p 1 pc (2)
4
4
2
pc (2) = p2 0 + p2 1 pc (1) + p2 2 pc (2)
2
1
= 0 + pc (1) + pc (2)
5
5
One can solve (3.4) and (3.5) to get pc (1) = 35 , pc (2) = 15 .
(3.3)
(3.4)
3.4.3 Definition:
A Markov chain is called an absorbing chain if
(i) It has at least one absorbing state; and
(ii) For every state in the chain, the probability of reaching an absorbing state in a
finite number of steps is nonzero.
Suppose an absorbing markov chain has r absorbing states and a set of transient states
grouped as set ST : we can write
S = ST ∪ C i ∪ C 2 ∪ . . . ∪ C r ,
where each Ci is a singleton set corresponding to each absorbing state. Thus, if need
be we can renumber the states and assume that the transition matrix has the form
absorbing states
← ST → 
 1↔r
↑

O
1≤i≤r 
 I



↓
P =

,


↑



j ∈ ST  R
Q
↓
where, R is the rectangular submatrix giving transition probabilities from non-absorbing
to absorbing states, Q is the square submatrix giving these probabilities from non- absorbing to non-absorbing states, I is an identity matrix, and 0 is a rectangular matrix
of zeros. Note that for every n


I
0




.
Pn = 


2
n−1
n
 (I + Q + Q + . . . + Q
)R Q 
3. Classification of states
38
n
n
Thus, if Qn = (qij
), then q1j
represents the probability of going from the non-absorbing
state i to the non-absorbing state j in n steps. Since the absorption probabilities satisfy:
X
pC(i) (k), i = 1, ...r, and j ∈ ST ,
pC(i) (j) = pji +
k∈ST
we have
B = R + QB
where, B is the matrix with ij th entry being pC(i) (j). Thus,
B = (I − Q)−1 R = N R,
where N := (I − Q)−1 , if it exists. Hence, to calculate the absorption probabilities,
one has to to show that N exists and calculate (I − Q)−1 . The matrix N = (I − Q)−1
is called the fundamental matrix of the absorbing chain.
3.4.4 Theorem:
For every absorbing chain the following holds:
n
(i) If qij
denote the denote the entries of Qn , then the mean absorption tine for a state
i is
∞
XX
n
µi :=
qij
.
j∈S m=0
(ii) Qn → 0, as n → ∞.
(iii) N := (I − Q)−1 exists.
(iv) If B = (I − Q)−1 R = N R = [bij ]i×j , Then bij is the probability that the chain will
be absorbed in state j starting from state i.
(v) If N = [nij ]i×j , then nij is the expected number of steps needed to go from i ∈ ST
to j ∈ ST .
Proof:
(i) The mean absorption tine for a state i is
µi
=
∞
X
k=1
=
=
k−1
X
m=0
∞
X
P {Starting in state i chain is absorbed at kth step}
P {Starting in state i chain is absorbed at kth step}
∞
X
m=0 k=m+1
=
=
∞
X
m=0
∞
X
m=0
=
∞
X
m=0
=
X
P {Starting in state i chain is absorbed at kth step}
P {Starting in state i chain is absorbed after m step}
P {Starting in state i chain is not absorbed by mth step}
X
m
qij
j∈S
∞
X
!
m
qij
.
j∈S m=0
n
(ii) Note that qij
= pn
ij for transient states i, j and for i a transient states i, j ,
n
∞. Hence, qij −→ 0, as n → ∞.
P∞
0
pn
ij <
3.4. Absorption probability
39
(iii) Define
Nn := I + Q + Q2 + . . . + Qn =
n
X
k+0
It is easy to check that
Since,
P∞
n
0 Qij
Qn , n ≥ 1.
Nn (I − Q) = (I − q)Nn = I − Qn+1 , for all n ≥ 1.
< ∞ by (ii), and
n
qij
(I − Q)
exists.
−1
(3.5)
−→ 0, we have
= N := limn→∞ Nn
(iv) Claim is obious.
(v) For i, j ∈ ST , let
X (k) =
Then
1
0
chain is in state j after k steps, starting at i.
otherwise.
P (X (k) = 1)
=
k
qij
P (X (k) = 0)
=
k
1 − qij
k
Thus, E(X (k) ) = qij
. Hence the expected number of times the chain is in state j in
first n steps ,starting in i, is
n
X
k
qij
.
E(X () + X () + . . . + X () ) =
k=0
Thus, using Fubini’s theorem, the expected number of times the chain is in state j,
starting in i, is
E
∞
X
X
(k)
k
!
=
E
limn→∞
n
X
X
(k)
k
=
limn→∞ E
n
X
X (k)
k
=
limn→∞
n
X
k
k
qij
!
!
!
=
∞
X
n
qij
.
0
The matrix N also helps us to compute ti , the mean(average) number of steps
(time) for which the chain
P will be in transient states starting from the state i ∈ ST .
This is given by ti =
j∈ST nij . This is also the absorbtion time starting at i. We
apply these to our case of random walk with absorbing barriers (with n + 1 states.)
3.4.5 Example:
P =
0
1
..
.
..
.
n








0
1
q
...
...
0
0
1
0
0
...
...
0
0
2
0
p
...
...
...
0
...
...
...
...
...
...
... ... n
... ... 0
... ... 0
...
...
...
...
q
0
p
... ... 1








3. Classification of states
40
We write this as (interchange 2nd and nth row and 2nd and nth column.)
P =



R=


0
n
..
.
..
.
n-1

0
..
.
..
.
p

q
..
.
..
.
0








0
1
0
q
...
...
0
n
0
1
0
...
0
0
0
0
0
q
...
...







, Q = 






1
 −q
I−Q=
 ...
0
1
−p
1
...
...
0
q
0
...
...
0
0
0
−p
...
...
...
...
...
p
...
q
0
p
0
q
...
...
0
0
...
...
...0
...
...
...
...
...
...
0
q
0
p
0
...
q
0
0
...
...
0
...
... n − 1

... 0
... 0 


...


...


p
0
...
0
p
...
0
q
0
...
...
...
0
The inverse of I − Q is given by N = (nij ),
0
...
0
...
p
0
q
...
...
1
q
0
0
0







0 

p 
0

0
0 

−p 
1
(i) If p + q, p/q = c,then
nij


1
×
=
n
(p − q)(r − 1) 
(r j − 1)(r n−i − 1) for j ≤ i
(r i − 1)(r n−i − r j−1 ) for j ≥ i.
(ii) If p=q=1/2, then
nij =
2
n
j(n − i) for j ≤ i
i(n − j) for j ≥ i.
And the time to stay in the transient state is

n
n−1
r − r n−i
 1
X
[n
− i] if p 6= 1/2
p−q
nij =
ti =
rn − 1

j=1
i(n − i) if p = 1/2.
From this we can draw the following:
Conclusions:
• The time ti of stay in transient state starting from i, or equivalently to the
time to get out of transient state depends upon i, even when p = 1/2. Note
ti = i(n − i) will be maximum when i is in the middle namely i = n/2 (if
n is even). Therefore, tmax = (n/2)2 . Thus, when both players have equal
probability of win, thee time of ruin is the product of the fortunes of the two
players namely i(n − i). And the game will last maximum time when both
have equal amount to start with. But if p 6= 1/2, one can show that
and for r > 1,
imax = logr ((r − 1)n)
n − imax
,
p−q
which is of lower order of magnitude compared to the case p = 1/2 case i.e.,
the game will finish much faster in this case. Next we calculate B = N R.
tmax
3.5. More on recurrence/transience
41
Since N is (n − 1) × (n − 1) and R is (n − 1) × 2 matrix, B = (n − 1) × 2
matix.First row gives probability for absorption in n. But pi0 = 1 − pin ∀i.
Thus enough to calculate any one one of them. We have
 n
n−1
 r − r n−i
X
if p 6= 1/2 (p/q = r)
n
nik .rkn =
bin =
1
 r −
i/n
if p = 1/2.
k=1
(i) If p = 1/2, probability that A is ruined is proportional to the ratio of fortune
A starts with to total fortune n
(ii) If p > 1/2, i.e., A has an advantage over B, then his chance of ruining his
r n − r n−i
opponent is
. Suppose r = 2, i = 1 then this is
rn − 1
2n − 2n−1
2n−1
2n−1
= n
= 1/2 (as n → ∞) = 1 − n−1 ,
n
2 −1
2 −1
2
which is quite good (for example if n = 2, i.e., opponent also has 1 rupee,
then this is 2/3 but even for n large this means A can have a good chance
of ruining B even if B has large capital and A starts only with rupee 1.
Imagine A is a gambling house and B is the player. Gambling house fixes odd states
r > 1 and make sure i is large for them. Then always stays in business is approximately
limn → ∞(bin ) ≃ 1 − (1/r)i ≃ 1.
Note that
limn → ∞(bin ) = limn → ∞
r n − r n−i
= 1 − (1/r)i
rn − 1
≃ 1.
Therefore,
probability that the player wins all the money =≃ 1 − (limn → ∞bin ).
Thus, gambling houses stay in business no matter how much is bet at the tables. Let
us see the absorbtion of the gambling house winning all the money for r near 1, (i.e.,
odds are favorable to the gambling house but not very much :ti ≃ i(n − i)). Thus, if
the gambling house can cover 10, 000 = 104 bets while all gambler put together can
provide 106 bets, then i(n − i) = 101 0 units of time, which is very large. So it will take
very long time to win all the money and in the mean time more new gamblers would
have born.
3.5. More on recurrence/transience
3.5.1 Another way of deciding a state is recurrent/transient
LetST denote the collection of all transient states of a system with transition probability
matrix P. From P remove the rows and columns for the states not in ST . Let Q be the
sub matrix obtained. (Q in general will not be a stochastic matrix.) It will only be
non-stochastic matrix. Let
Q = (qij )ij ∈ ST .
Consider a system of linear equations in variables x1 , x2 , ..., ST = {1, 2, ...} :
xi =
X
k∈T
qik xk , 0 ≤ xi ≤ 1, i ∈ ST .
3. Classification of states
42
(i) The maximal solution of the above system are the probabilities that if a system
starts at i ∈ ST , then it will stay in that. Thus, maximal
xi = P {Xk ∈ ST |x0 = i.} ∀k.
(ii) From a transient state, what is the probability it will go in a recurrent state and
then stay in there only? Let C be a closed set of recurrent state. The probability
yC (i) that the system starting at a state i will reach state C and then forever remains
in it. Clearly, if C is irreducible, then
yC (i) = 1 if i ∈ C.
yC (i) = 0 if i ∈
/ C, but i is recurrent.
The case of interest is when i is transient. In that case, yC (i) is the minimum
non negative solution of the equation.
X
X
p(ij) yC (i).
p(ij) +
yC (i) =
j∈T
j∈C
Let i0 ∈ S. Consider Ci0 is the closure of the set {i0 .}
Let Ci0 = {j1 , j2 , ...} Then, i0 will be transient iff the system of equations
P
xji = k pji jk xjk 0 ≤ ji ≤ 1
have a non trivial solution. Note xji = 0 ∀i is the only solution. Let us apply this
criterion. We give some examples.
3.5.2 Example:
Consider the following queueing model.
Xn
=
Number of customers at the counter.
ξn
=
Number of new customers that arrive in nth minute.
Each {ξn } can take only three values {0, 1, 2} with probability {α0 , α1 , α2 }, i.e., the
distribution of ξn is
p{ξn = 0} = α0
,
p{ξn = 1} = α1
p{ξn = 2} = α2
α0 + α1 + α2 = 1. Then, S = {0, 1, 2, ...} and transition probability matrix.
0
1
2
3
p00 = α0
=






α0
α0
0
0
...
0
α1
α1
α0
0
...
1
α2
α2
α1
α0
...
2
0
0
α2
α1
...
3
...
...
0
α2
...
...
...
...
...
...



,


Number of customers at n, no new comes
p01 = α1
=
Number of customers at n, one customer comes
p02 = α2
=
Number of customers at n, two customer comes
If we assume α0 , α2 6= 0, then, the chain is irreducible. We want to know whether it
is recurrent or transient. Let us look at state 0. We have to see whether we can find a
non-trivial solution of
xi =
P
j6=0
pij xj , 0≤ x1 ≤ 1, i 6= 0. 0 ≤ x1 ≤ 1.
3.5. More on recurrence/transience
43
In our case, equations are
x1
=
α1 x1 + α2 x2 .
xk
=
α0 xk−1 + α1 xk + α2 xk+1 , k ≥ 2.
One can show that a solution is given by (see Billiysley page 126 :)
xk =
(
B[(
α0 k
) − 1],
α2
Bk
if α0 6= α2 ,
if α0 = α2 ,
α0 k
for some constant B. Thus, if α0 ≥ α2 , then ( α
) → ∞ and Bk k → ∞. as k → ∞.
2
Hence, non trivial solution exists if α0 < α2 in which case chain will be transient. But
transient means with probability 1, the chain must go away from state j. Hence with
probability 1 the chain queue size will go to ∞. Not in this case, (α2 − α0 ) > 0 which
is the expected increase in queue length. Queue goes to ∞ iff this is > 0.If α0 ≥ α2 ,
then chain is recurrent i.e., every state will be visited infinitely often.
Since
S = ST ∪ C1 ∪ C2 ∪ ...
We ask the question: Given system starts in ST , what is probability that it will stay
in it? The answer is as follows:
3.5.3 Theorem:
Let U ⊂ ST and i ∈ U. Then,
X
= P {Xn ∈ U ∀N ≥ 1|x0 = i}, i ∈ U
i
are the maximal solutions of the system
P
xi = j∈U pij xj |i ∈ U 0 ≤ xi ≤ 1.
Let us look at an example.
3.5.4 Example(staying in transient states):
Consider the unrestricted random walk with transition matrix


q
0
p
0
0 ...
 0
q
0
p
0 ... 


 0
0
q
0
p ... 
... ... ... ... ... ...
We know all the states are transient. Consider U = {0, 1, 2, ...} ⊂ S. We want to know
what is the probability that staying at some i ∈ U. It will stay in U. This is given by
maximal solution of
P
xi = j∈U pij xj |i ∈ U 0 ≤ xi ≤ 1.
In our case, these are
x0
=
p00 x0 + p01 x1 = px1 .
xi
=
pii−1 xi−1 + pii+1 xi+1 .
(p + q)xi
=
qxi−1 + pxi+1 .
p(xi+1 − xi )
=
xi+1 − xi
=
q(xi − xi−1 ).
q
(xi − xi−1 ).
p
3. Classification of states
44
The only bounded solution for q ≥ p is x0 = 0 = x1 , implying xi+1 = 0 ∀i.
Therefore, xi = 0. In this case, probability of staying on non negative side is zero. If
x0
q < p, then maximal solution can be found as x1 =
.
p
2
q
q
x0
x2 − x1 =
(x1 − x0 ) =
p
p
2
q
x2 = x1 +
x0
p
2
q
x0
x0
+
=
p
p
"
2 #
1
q
= x0
.
+
p
p
Thus, for general n ≥ 1,
xn = 1 −
Therefore,
n
q
.
p
xn = P {system stay in (0, 1, 2)|X0 = n} = 1 −
As n becomes large, this probability goes to 1.
n
q
p
Chapter 4
Stationary distribution
for a markov chain
4.1. Introduction
Let {Xn }n ≥ 0 be a markov chain and P be it’s transient probability matrix. Let
Π0 (i) be its initial distribution. In this chapter, we want to analyze the asymptotic
(long run)
P behavior of the chain. Suppose there exist {µi }i∈S , such that µi ≥ 0 for all
i ∈ S,
i∈S µi = 1 and
X
µi Pij = µj , j ∈ S.
(4.1)
i∈S
Then (4.1) implies that
X
µi p2ij
=
i∈S
Using induction, for all n ≥ 0,
X
X
µi
i∈S
X
i∈S
l∈S
pil plj
!
=
X
µl Pljn = µj .
l∈S
µi pn
ij = µj , j ∈ S.
In case, µi = Π0 (i) for every i, we have
P {Xn = j} =
X
i∈S
Π0 (i)pn
ij = Π0 (j), j ∈ S.
Thus all Xn′ s have the same distribution. Thus, in some sense, the chain is very stable.
4.1.1 Definition:
A markov chain {Xn }n ≥ 0 with P as it’s transient probability matrix and Π0 (i)
as its initial distribution is said have a stationary distribution
or an invariant
P
distribution if there exist {µi }i∈S , such that µi for all i ≥ 0,
µi = 1, and
X
µi Pijn = µj , j ∈ S.
i∈S
Given a markov chain, one would like to answer the following questions:
(i) When does the markov chain has a stationary distribution?
(ii) How to get the stationary distribution if it exists?
(iii) Is stationary distribution unique?
45
4. Stationary distribution for a markov chain
46
(iv) What are the consequences of having a stationary distribution?
In the next section we look at the concept of stopping times, needed to answer the
above questions.
4.1.2 Example:
Consider a markov chain with
P =
1 2 3 
0 1 0
1
2  0 0 1 
1 0 0
3
Intuitively, chain spends one third of the time in state 1, one third of the time in
state 2, one third of the time in state 3. In fact, if we take Π0 = (1/3, 1/3, 1/3), then
Π0 = Π0 P, i.e., Π0 is a stationary distribution. The chain is irreducible with period 3.
0 if n is not a multiple of 3,
pn
=
ii
1
if n is a multiple of 3.
Thus, {pn
ii }n≥1 is not convergent.
4.1.3 Example:
On a highway, three out of four trucks on the road are followed by a car, while only
one out of every five cars are followed by a truck. What fraction of vehicles on the
road are trucks?
To answer this question, we construct a markov chain as follows: Consider sitting on
the side of the road and observe vehicles go by. Then, observation at time n, is
0
Xn =
1,
where 0 signifies the appearance of a truck and 1 signifies appearance of a car. Thus
the state space is S = {0, 1} with transition matrix
P =
0
1
0
1/4
1/5
1 3/4
4/5
If we want each Xn to have same distribution Π0 , then Π0 = Π0 P. Let Π0 = (p0 , p1 ).
Then
1/4 3/4
.
(p0 , p1 ) = (p0 , p1 )
1/5 4/5
Therefore,
p0
=
p1
=
1
=
p0
p1
+
4
5
4p1
3p0
+
4
5
p0 + p1 .
This implies p0 = 4/19, p1 = 15/19. So, in the long run the fraction of trucks will be
4/19.
4.2. Stopping times and strong markov property
Given a markov chain {Xn }n≥1 , let An denote the σ algebra determined by the random variables X0 , X1 , ..., Xn .
4.2. Stopping times and strong markov property
47
4.2.1 Definition:
A random variable T : Ω → N ∪ ∞ is called a stopping time if {T = N } ∈ An for all
n.
4.2.2 Examples:
(i) For i ∈ S, let
Si =
inf{n ≥ 0|Xn = i}
+∞
if such an n exists.
otherwise
It is a stopping time called the first passage time.
(ii) For i ∈ S, let
Ti =
inf{n ≥ 1|Xn = i}
+∞
if such an n exists.
otherwise
It is a stopping time called the first return to state i.
(iii) For A ⊆ S, let
inf{n ≥ 1|Xn ∈ A}
TA =
+∞
if such an n exists.
otherwise
TA is a stopping time called the first visit to set A.
4.2.3 Note:
The event {Tj = n|X0 = i} is starting at i and first visit to j is at time n. In our
n
earlier notations (see section 3.3,) P {Tj = n|X0 = i} = fij
. Thus,
fii =
∞
X
n=0
fiin = P {Tj < +∞|X0 = i}
Thus, a state i is recurrent iff P {Ti < +∞|X0 = i} = 1 and a state i is transient iff
P {Ti < +∞|X0 = i} < 1.
Let {Xn }n≥1 be a markov chain and T be a stopping time. Let
AT = {B ∈ A|B ∩ {T = n} ∈ An ∀n}
It is called the stopping tine σ- algebra or the σ- algebra determined by the shopping
time.
4.2.4 Proposition (Strong Markov Property):
For every A ∈ AT , m > 0, i1 , i2 , ...im ∈ S,
P {A ∩ {XT +1 = i1 , XT +2 = i2 , ..., XT +m = im }|XT = i, T < +∞}
= P {A|XT = i, T < +∞}P {X1 = i1 , ..., Xm = im |X0 = i}
(4.2)
Proof:
It is enough to prove (4.2) for events A ∈ A of the type A ∩ {T = m} ∈ ∩AT . For such
an event A, (4.2) is
P {A ∩ {Xn+1 = i1 , ..., Xn+m = im }|Xn = i}
=
P {A|Xn = i}P {X1 = i1 , ..., Xm = im |X0 = i}
(4.3)
Now note that by the the markov property,(4.3) holds when A is a simple event in An ,
determined by and the general event is a countable disjoint union of such events.
4. Stationary distribution for a markov chain
48
4.3. Existence and uniqueness:
From now onwards, we shall write
Pi (A)
=
Ei (f )
=
P {A|X0 = i} for all events A.
Ef |X0 = i for every random variables f.
4.3.1 Theorem:
Let {Xn }n≥1 be an irreducible recurrent chain with transition matrix P. Then there
exist numbers {rik }k , i ∈ S with the following properties:
(i) rkk = 1 for all k ∈ S.
X
(ii) rik =
∈ Srik pij j ∈ S.
i
(iii) 0 <
rik
< +∞ for alli ∈ S.
In other words, the chain has an stationary (invariant) measure.
Proof:
Let for k, i ∈ S,
rik :=
∞
X
Ek (XXn =i,Tk ≥n )
n−1
this represents the total expected number of visits to state i between (any) two visits
to state k. Note {Xn = i, Tk ≥ n} represent the event that the chain is in state at time
n and not yet visited state k by n.
Thus if (k is recurrent),for i = k, the chain visits Xn only once till next visit to k.
Thus,
rkk = 1∀k ∈ S.
This proves (i)
Next, for k, j ∈ S, k 6= j
rjk
=
=
∞
X
n=1
∞
X
n=1
=
=
Ek (XXn =j,Tk ≥n )
Pk (Xn = j, Tk ≥ n)
Pk (X1 = j, Tk ≥ 1) +
pkj +
∞
X
n=2
= pkj +
X
i∈S
i 6= k
X
i∈S
i 6= k
= pkj +
X
i∈S
k 6= j
∞
X
n=2
Pk (Xn = j, Tk ≥ n)
Pk (Xn = j, Xn−1 = i, Tk ≥ n)
∞
X
Pk (Xn = j, Xn−1 = i, Tk ≥ n)
∞
X
Pk (Xn−1 = i, Tk ≥ n)pij .
n=2
n=2
4.3. Existence and uniqueness:
∞
X
X
= pkj +
49
Pk (Xn−1 = i, Tk ≥ n − 1)pij (asXn−1 = i)
i ∈ S n=2
k 6= j
X
rik pij
= pkj rkk +
=
X
i
i∈S
k 6= j
∈ Srik pij .
This proves (ii).
Finally, since j ↔ k for every j and k ∈ S and chain is irreducible (recurrent), thus
there exists m such that pm
ki > 0. Thus,
m
rik > rkk pm
ki = pki > 0.
(because rkk = 1) Also,
1 = rkk = rik pm
ik ∀i.
This implies rik < +∞.
4.3.2 Theorem (uniqueness):
Let {Xn }n≥1 be an irreducible chain and λ is an invariant distribution for P with
λk = 1. Then, λ ≥ rk , where r k is as defined in theorem above. If in addition P is
recurrent, then λ1 = rk .
Proof: Using invariance of λ, ∀i
X
λi1 pi1 i.
λi =
i1 ∈S
=
=
=
=
=
≥
=
X
λi1 pi1 i. + pki (becauseλk = 1)
i1 ∈ S
i1 6= k
X
X
(
∈ Sλi2 pi2 i1 ) + pki
i1 ∈ S
i1 6= k
X
i2
(
X
λi2 pi2 i1 + pki1 )pi1 i + pki
i2 ∈ S
i1 ∈ S
i2 =
6 k
i1 6= k
X
λi2 pi2 i1 Pi1 i + (
X
i1 ∈ S
i1 , i2 ∈ S
i1 6= k
i1 6= k, i2 6= k
−−−−
m
X
X
pki1 pin pin−1 in−2 ...pi,i )
(
n=0 i1 ,i2 ,...,in 6=k
m
X
n=0
Pk (Xn = i, Tk ≥ n)
pki1 pi1 , i + pki )
50
4. Stationary distribution for a markov chain
Since this holds ∀n,
λi ≥
∞
X
n=0
pk {Xn = j, Tk ≥ n} = rik .
k
In case, the chain is recurrent, we can select n such that pn
jk > 0. Let µ = λ − r . Then
k
µ is also an invariant measure and µk = λk − rk = 0. Thus,
X
n
0 = µk =
µi pn
ik ≥ µj pjk ≥ 0.
i∈S
implies
µj = 0 ∀ j,
(as pn
jk > 0).
To go from the invariant measure to a distribution, we need
∞
X
rik < +∞. For this we
i=0
make the following definition:
4.3.3 Definition:
Let {Xn }n≥1 be a chain.
(i) Let mi = Ei (Ti < +∞), be the expected return time for state i.
(ii) We say a recurrent state i is positive
if mi < +∞. Otherwise, we call
X recurrent
it null recurrent. Note that mi =
rji .
j∈S
We have the following theorem.
4.3.4 Theorem:
Let {Xn }n≥1 be an irreducible chain. Then the following are equivalent.
(i) All the states are positive recurrent.
(ii) There exists a state that is positive recurrent.
(iii) There exists an invariant distribution π with the property πi =
1
∀i.
mi
Proof:
(i) ⇒ (ii) is obvious.
(ii) ⇒ (iii): If k is a positive recurrent state, consider rjk , j ∈ S as constructed in first
P
theorem. Since mk = j∈S rjk is finite, define
πi =
rik
.
mk
Then, πii∈S is an invariant distribution.
P
(iii) ⇒ (i): Take any k ∈ S as fixed. Since P is irreducible and i∈S πi = 1,
X n
πik > 0 for some n
πk =
i∈S
Hence πk > 0 ∀k. Define
πi
, k ≥ 0.
πk
Then λi is an invariant measure and λk = 1. Thus, by theorem 4.3.2, λ ≥ rk . Hence,
X k X
Σπi
1
λi =
ri ≤
mk =
=
< ∞.
(4.4)
π
π
k
k
i∈S
i∈S
λi =
Thus, k is positive recurrent. In fact, P recurrent implies (theorem 4.3.2) that
λ = r k . Hence, equation(4.4) says
1
∀k.
mk =
λk
4.3. Existence and uniqueness:
51
4.3.5 Example (Random walk on the line):
Recall i − 1 ←q i →p i + 1
p
1−p
P =
.
1−q
q
(i) The walk is transient if 4pq < 1.
(ii) For p = q = 21 : It is called the symmetric random walk, it is recurrent.
Consider the measure πi = 1 ∀i ∈ S. Then,
1
1
πi−1 + πi+1 .
2
2
Hence,Π = (1, 1)is an invariant measure. In case, an invariant distribution exists it
must be a scalar multiple of π, but Σπi = +∞. Hence, there does not exist any stationary distribution. It is null recurrent.
πi =
4.3.6 Example (Asymmetric random walk ):
Let
pi,i−1 = q < p = pi,i+1 .
Though each state is transient and theorem does not apply, let us try to find invariant
distribution Π.. For this,
⇔ ΠP = Π
⇔ πi−1 p + πi + 1q
= πi
Π invariant
This gives a recurrence relation, and a general solution can be found:
i
p
πi = A + B
,
q
where A,B are arbitrary constants. This shows that invariant measure need not be
unique.
4.3.7 Example(Simple symmetric Random walk on Z2 ):
...
...
...
..
.
•
..
.
•
..
.
•
..
.
Pij =
..
.
...
..
.
•
..
.
...
1/4
←−
↑
•
1/4
−→
1/4
...
↓
•
..
.
1/4
...
2n
P00
=
...
•
...
•
..
.
...

 1/4 if |i − j| = 1.

Then
..
.
•
"
0 otherwise
2n
n
2n #2
1
.
2
An intuitive way of seeing this is as follows: Consider Xn+ as orthogonal projection
of Xn onto y = x and Xn− as orthogonal projection onto y = −x. Then Xn+ and Xn−
4. Stationary distribution for a markov chain
52
are symmetric independent random variable on
Xn = 0 iff
Xn+
√
2 Z and
= 0 = Xn− .
Now Sterling formula gives
2
A2 n
= +∞, that is the symmetric random walk is recurrent.
p2n
00 ∼
Hence,
P∞
n=0
pn
00
4.3.8 Remark:
(i) Similar analysis is possible for random walks in R3 .
(ii) For random walks on the line/plane:
pn
ij → 0; as n → ∞ ∀i, j.
4.3.9 Theorem (Existence for finite state space):
Let S be finite and
pn
ij → πj as n → ∞∀i
4.4. Asymptotic behavior
53
Then (πj )j∈S is an invariant distribution.
Proof:
Note that
X
(limn→∞ pn
Σj∈S πj =
ij ∀i
j∈S
=
limn→∞ (
X
pn
ij ) = 1,
j∈S
because P is stochastic. And
πj
=
=
n+1
limn→∞ (pn
ij ) = limn→∞ (pij
X n
pik pkj )
limn→∞ (
k∈S
=
X
(limn→∞ pn
ik )pkj
i∈S
=
X
πk pkj .
i∈S
Question:
When can we generalize above theorem? Some answers are given in the next section.
For more details see Billingsley [4]
4.4. Asymptotic behavior
4.4.1 Theorem:
Let (Xn )n≥1 be an irreducible aperiodic chain for which a stationary distribution π
exists. Then the chain is persistent with
limn→∞ pn
ij = πj ∀ i, j.
Further, all πj > 0 and the stationary distribution is unique.
4.4.2 Theorem:
Let Xn )n≥1 be irreducible aperiodic chain for which no stationary distribution exists.
Then
limn→∞ pn
ij = 0 ∀ i, j.
4.4.3 Classification of irreducible aperiodic chains:
(i) Transient.
P
n
P
n
pn
ij < +∞. This implies the general fact that limn→∞ pij = 0.
n
(ii) Recurrent.
n pij = ∞. No stationary distribution and by theorem 4,
n
limn→∞ pij = 0.
(iii) ∃ stationary distribution. Hence, positive recurrent. This implies that
pn
ij → πj > 0.
Diagonalization of
matrices
Let A be a n × n matrix with entries from IF = R or C.
A.1.1 Definition: A matrix A is said to be diagonalizable if A is similar to a
diagonal matrix, i.e., there exists an invertible matrix P such that P −1 AP is a diagonal
matrix.
We would like to know when is a given matrix A diagonalizable and if so, how to
find P such that P −1 AP is diagonal? Next theorem answers this question.
A.1.2 Theorem:
Let A be a n × n matrix. If A is diagonalizable, then there exist scalars
λ1 , λ2 , . . . , λn in IF
and vectors
C 1 , C 2 , . . . , C n in IFn
be such that the following holds:
(i) AC i = λi C i for all 1 ≤ i ≤ n. That is A has n-eigenvalues.
(ii) The set {C 1 , . . . , C n } is linearly independent, and hence is a basis of IFn .
Theorem A.1.2 says that if A is diagonalizable then not only A has n eigenvalues,
it has a basis consisting of eigenvectors. In fact, the converse of this is also true.
A.1.3 Theorem:
(i) Let A be a n × n matrix. If A has n linearly independent eigenvectors, then A is
diagonalizable.
(ii) Let A be a n × n matrix. If A has n distinct eigenvalues, then A is diagonalizable .
(iii) If A is real symmetric, then there exists an orthogonal matric P such that P AP −1
is diagonal.
A.1.4 Note:
Theorem 10.1.3 not only tells us when is A diagonalizable, it also gives us a matrix
P which will diagonalize A, i.e., P −1 AP = D, and gives us the resulting diagonal
matrix also. The columns vectors of the matrix P are the n eigenvectors of A and the
diagonal matrix D the diagonal entries as the eigenvalues of A corresponding to these
n-eigenvectors.
55
56
Diagonalization of matrices
For more details refer ”From Geometry to Algebra- An Introduction to Linear
Algebra” by Inder K. Rana, Ane Books, New Delhi, 2010.
References
Markov chains:
[1 ] ’Finite micro chains’
- Kemeny and Sneel
Springer-verlag
[2 ] ’Introduction to stochastic processes’
- Hoel,Port and Stone
Houghton Mifflin Company
[3 ] ’A first course in stationary process’
- Karlin and Taylor
Academic Press
Probability and Measure:
[4 ] ’Introduction to probability and measure’
- P.Billingsley.
[5 ] ’Introduction to probability and measure’
- K.R. Parthasaathy.
Measure theory:
[6 ] ’An Introduction to Measure and Integration’
- Inder K. Rana.
Narosa Publishers
Linear Algebra
[7 ] ’From Geometry to Algebra- An Introduction to Linear Algebra’
- Inder K. Rana,
Ane Books, New Delhi, 2010.
57
Index
Conditional impendence, 2
absorbing chain, 37
absorbing state, 7, 23
aperiodic chain, 27
asymmetric random walk, 51
Bay’s formula, 2
birth and death chain, 9, 28
chain rule, 2
closed set, 23
communicate, 25
communicating
class, 25
states, 25
countably additive, 1
distribution
nth stage, 16
Ehrenfest diffusion model, 11
events, 1
expected return time, 50
absorbing barriers, 7
asymmetric, 51
on line, 51
unrestricted, 7
random wlak
symmetric on plane, 51
reachable state, 23
recurrent, 30
space, probability, 1
state space, 4
decomposition, 36
stationarity property, 4
stochastic matrix, 20
stopping time, 47
strong markov property, 47
stsionary distribution, 45
transient, 30
transition graph, 5
transition matrix, 5
transition probability
nth stage, 16
single step, 4
first passage time, 47
first return time, 47
first visit time, 47
fundamental matrix, 38
graphical representation, 5
initial distribution, 5
invariant distribution, 45
irreducible
set, 24
markov chain, 5
markov property, 4
null recurrent, 50
period, 27
persistent, 30
positive recurrent, 50
probability, 1
probability space, 1
random walk
reflecting barriers, 8
random walk
59