Download Markov chains

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

Transcript
Lecture 5. Markov chains
Mathematical Statistics and Discrete Mathematics
November 16th, 2015
1 / 22
Stochastic processes
• What we learned so far was about random variables interpreted e.g. as
measurements of some random systems.
• Most (physical) systems evolve in time and one wants to be able to analyze such
systems.
• This can be done by, say, repeated measurements indexed by the time of the
measurement.
• This is modelled by objects called stochastic processes.
Main motivating question: How to analyze random systems evolving in time?
Let S be a sample space with a probability measure P on the events contained in S.
A stochastic process is a collection of random variables {Xt }t∈T defined on S indexed
by time t ∈ T. If T is countable, then we say that the process is a discrete-time
stochastic process. Otherwise we say that it is a continuous-time stochastic process.
2 / 22
Markov chains
We will use the notation {Xn } = {Xn }∞
n=0 = X0 , X1 , X2 , . . . for a sequence of random
variables. We will consider random variables taking values in countable sets (not
necessarily subsets of the real numbers).
By S we will denote the state space, that is, a countable set that our random variables
will take values in.
A sequence of random variables {Xn } defined on the sample space S and taking
values in a state space S is called a Markov chain if it satisfies the Markov property:
for all n and all s0 , s1 , . . . , sn ∈ S,
P(Xn = sn |Xn−1 = sn−1 , Xn−2 = sn−2 , . . . , X0 = s0 ) = P(Xn = sn |Xn−1 = sn−1 ).
One can put the Markov property into words in the following way: the future state of
a Markov chain is influenced by its past states only through its present state.
3 / 22
Markov chains
Note that a Markov chain is a discrete-time stochastic process.
A Markov chain is called stationary, or time-homogeneous, if for all n and all
s, s0 ∈ S,
P(Xn = s0 |Xn−1 = s) = P(Xn+1 = s0 |Xn = s).
The above probability is called the transition probability from state s to state s0 .
X A game of tennis between two players can be modelled by a Markov chain Xn
which keeps track of the current score. The transition probabilities depend on the
skills of players, and probably also on the score itself.
X A game of chess can be modelled by a Markov chain Xn whose state space is the
space of all possible chessboard configurations.
4 / 22
Markov chains
Let {Xn } be a sequence of independent random variables. Then,
X {Xn } is a Markov chain (The Markov property holds here trivially since the past
does not influence the future at all). It is stationary if and only if the variables
have the same distribution .
(1)
= max{X1 , X2 , . . . , Xn }} is a Markov chain.
X
(2)
{Yn
= min{X1 , X2 , . . . , Xn }} is a Markov chain.
X
(1)
{Yn
+ Yn } is not a Markov chain.
X {Yn
(1)
(2)
(2)
X (Yn , Yn ) is a Markov chain.
X {Yn = X1 + X2 + . . . + Xn } is a Markov chain. If Xi takes values {−1, 1}, it is
called a random walk.
5 / 22
One can define Markov chains by using state diagrams:
X The following state diagram describes a Markov chain on the state space
S = {A, B, C, D}.
1
2
1
1
B
1
1
4
2
2
1
4
1
4
C
D
1
4
1
A
2
X The following diagram describes a Markov chain formed from a repeated
execution of Experiment 2 from the previous lecture: the next coin toss is
always biased towards the outcome of the current toss.
1
2
3
2
3
0
1
1
3
3
6 / 22
Transition matrix
We will only deal with stationary Markov chains on finite state spaces and we will
drop the word stationary in the statements.
Let {Xn }∞
n=1 be a Markov chain on a finite state space S = {s1 , s2 , . . . , sk }. The
transition matrix P is a k × k matrix


p1,1 p1,2 . . . p1,k
p2,1 p2,2 . . . p2,k 


P= .
..
.. 
..
 ..
.
.
. 
pk,1 pk,2 . . . pk,k
where
pi,j = P(Xn = sj |Xn−1 = si )
for all i, j = 1, 2, . . . , k.
A matrix P is a transition matrix of some Markov chain if and onlyP
if all its entries are
k
non-negative and they sum up to 1 in each row, that is, for every i, j=1 pi,j = 1.
7 / 22
Transition matrix
X The matrices

1
0
P1 = 
0
0
0
1
0
0

0
0
,
0
1
0
0
1
0

0
0
P2 = 
0
1
1
0
0
0
0
1
0
0

0
0
,
1
0
define deterministic (cellular automaton) Markov chains on a four-element set.
X The matrices

0
1
2
P1 = 
0
1
2
1
2
0
1
2
0
0
1
2
0
1
2
1
2

0

1 ,
2
0

0
1
3
P2 
0
2
3
1
3
0
1
3
0
0
2
3
0
1
3
2
3

0

2 ,
3
0
define a symmetric and an asymmetric random walk on a four-element cyclic set.
8 / 22
Transition matrix
X Let S = {s1 , s2 , s3 , s4 } = {A, B, C, D} and consider a Markov chain be given by
the following diagram.
1
2
1
1
B
1
1
4
2
2
1
4
1
C
4
D
1
4
1
A
2
Then,

1/2 1/2
1/2 1/2
P=
1/4 1/4
0
0
0
0
1/4
0

0
0 
.
1/4
1
9 / 22
Transition matrix
X Let S = {s1 , s2 } = {0, 1} and consider a Markov chain given by the following
diagram
1
2
3
2
3
1
0
1
3
3
Then,
P=
2/3
1/3
1/3
.
2/3
X Let {Xn } be a sequence of independent coin tosses and let
{Yn = max{X1 , X2 , . . . , Xn }}. Then,


1/6 1/6 1/6 1/6 1/6 1/6
 0
2/6 1/6 1/6 1/6 1/6


 0
0
3/6 1/6 1/6 1/6

.
P=
0
0
4/6 1/6 1/6
 0

 0
0
0
0
5/6 1/6
0
0
0
0
0
1
10 / 22
Distribution vector
One can represent a probability distribution of a random variable X taking values in a
finite state space S = {s1 , s2 , . . . , sk } by the distribution vector ~u given by
~u = (u1 , u2 , . . . , uk ),
ui = P(X = si ) for i = 1, 2, . . . , k.
where
We choose ~u to be a row-vector.
Let {Xn } be a Markov chain on S = {s1 , s2 , . . . , sk }, and let
(n)
(n)
(n)
~u(n) = (u1 , u2 , . . . , uk )
be the distribution vector of Xn . Then,
~u(n) = ~u(n−1) · P = ~u(n−2) · P2 = · · · = ~u(0) · Pn ,
where · denotes matrix multiplication. We call ~u(0) the initial distribution.
11 / 22
Distribution vector
Proof. Note that the events {Xn−1 = s1 }, {Xn−1 = s2 }, . . . , {Xn−1 = sk } are pairwise
disjoint and their union is the full sample space S. We can hence use the total
probability formula for the event {Xn = sj }. Fix a state sj . We can write
P(Xn = sj ) =
k
X
P(Xn = sj |Xn−1 = si )P(Xn−1 = si )
i=1
=
k
X
pi,j P(Xn−1 = si )
i=1
=
k
X
i=1
(n−1)
=u
(n−1)
pi,j ui
· cj ,
(dot product of u(n−1) and cj ).
where cj is the j-th colum vector of P. This agrees with the definition of matrix
multiplication.
(0)
We say that the chain {Xn } starts in state si if the initial distribution satisfies ui
= 1.
12 / 22
Distribution vector
The i, j entry of the matrix Pn is
(n)
pi,j = P(Xn = sj |X0 = si ).
X Consider a Markov chain {Xn } given by the matrix
2/3 1/3
P=
1/3 2/3
and started at s1 . What is the distribution vector of X2 , X4 , X8 and X16 ? We have
u(0) = (1, 0), and
~u(2) = u(0) · P2 = (5/9, 4/9),
~u(4) = u(0) · P4 = (41/81, 40/81),
~u(8) = u(0) · P8 = (3281/6561, 3280/6561),
~u(16) = u(0) · P16 = (21523361/43046721, 21523360/43046721).
13 / 22
Absorbing Markov chains
A state si ∈ S of a stationary Markov chain is called
• absorbing if pi,i = 1,
• transient if pi,i < 1.
A Markov chain is called absorbing if from any transient state, one can reach an
absorbing state.
14 / 22
Absorbing Markov chains
X Let a Markov chain be given by the following transition matrix


1 0 0 0 0
 1 0 1 0 0

2 1 2 1

P=
 0 2 01 2 01  .

0 0
0
2
2
0 0 0 0 1
This is an absorbing Markov chain. This chain can be interpreted as a mouse
jumping on a table of length 1m and making jumps of 0.5m to the left or to the
right with equal probability. The absorbing state is when the mouse falls off the
table (either on the left-hand side or the right-hand side).
15 / 22
Absorbing Markov chains
X Consider a Markov chain given by the following diagram:
1
2
1
1
B
1
1
4
2
2
1
4
1
C
4
D
1
4
1
A
2
This Markov chain is not absorbing since the only absorbing state D cannot be
reached from the transient states A and B.
16 / 22
Absorbing Markov chains
The probability that an absorbing Markov chain will eventually end up in one of its
absorbing states is 1.
Proof. From each nonabsorbing state sj it is possible to reach an absorbing state. Let
mj be the minimum number of steps required to reach an absorbing state, starting
from sj . Let pj be the probability that, starting from sj , the process will not reach an
absorbing state in mj steps. Then pj < 1. Let m be the largest of the mj and let p be the
largest of pj . The probability of not being absorbed in m steps is less than or equal to
p, in 2m steps less than or equal to p2 , etc. Since p < 1, these probabilities tend to 0.
For the interested: this argument is very similar to the one for the infinite monkey
theorem which states that a monkey hitting keys at random on a typewriter keyboard
for an infinite amount of time will with probability 1 type a given text, such as the
complete works of William Shakespeare.
17 / 22
Canonical form of the transition matrix
Let {Xn } be an absorbing Markov chain with l transient states and r absorbing states.
Let {s1 , s2 , . . . , sl+r } be a numbering of the state space where all the absorbing states
come at the end. Let P be the transition matrix for this particular numbering. Then P
is in the canonical form:
Q R
P=
,
0 I
where
• I is an r × r identity matrix,
• 0 is an r × l zero matrix,
• Q is an l × l matrix with transition probabilities between transient states,
• R is an l × r matrix with transition probabilities from transient to absorbing
states.
Note that
n
P =
Qn
0
R(n)
I
for some matrix R(n).
18 / 22
Time to absorption
Let {Xn } be an absorbing Markov chain and let Q be as before. The fundamental
matrix is defined to be
N = (I − Q)−1
The entries ni,j of N satisfy
ni,j = the exp. # of visits of {Xn } to the state sj when started from state si .
Note that N = I + Q + Q2 + . . ..
Proof. Fix i and let {Xn } be a Markov chain started at si . Let Ynj = 1 if Xn = sj , and
Ynj = 0 otherwise. Then,
Y j = Y0j + Y1j + . . .
is the total number of visits of {Xn } to sj . Hence,
E[Y j ] = E[Y0j ] + E[Y1j ] + . . . = Ii,j + P(X1 = sj | X0 = si ) + P(X2 = sj | X0 = si ) + . . .
(2)
(2)
= Ii,j + pi,j + pi,j + . . . = Ii,j + qi,j + qi,j + . . . = ni,j ,
(n)
where qi,j is the i, j entry of the matrix Qn .
19 / 22
Time to absorption
Let
ti = the expected time before absorption when started from si ,
and let ~t = (t1 , t2 , . . . , tl ) be a column vector. Then,
~t = N · ~c,
where ~c is a column vector of 1’s.
Proof. Fix i and let {Xn } be a Markov chain started at si . Let Z j be the total number
P of
visits of {Xn } to sj . Let Ti be the time until absorption for {Xn }. We have Ti = j Z j ,
and hence
X
X
ti = E[Ti ] =
E[Z j ] =
ni,j = (N · ~c)i .
j
j
20 / 22
Probability of absorption
Let si be a transient state. We define
bi,j = P( absorption in sj | X0 = si ),
and put B = (bi,j )1≤i≤l,1≤j≤r . Then,
B = N · R.
Proof. We represent Qni,k as a sum over all possible trajectories of {Xn } of n steps from
a transient state si to a transient state sk . All trajectories of n + 1 steps from si ending
at an absorbing state sj have to have their first n steps between transient states, and
their last step between a transient and the absorbing state sj . Hence, the sum over all
such trajectories is
X
Qni,k Rk,j = (Qn R)i,j .
k
If we sum over all possible numbers of steps, we get
Bi,j = [(I + Q + Q2 + . . .)R]i,j = (NR)i,j .
21 / 22
Exercise: Compute the vector ~t and the matrix B for the mouse on a table.
22 / 22