Download Markov Chains

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Pattern recognition wikipedia , lookup

Multi-armed bandit wikipedia , lookup

Agent (The Matrix) wikipedia , lookup

Inductive probability wikipedia , lookup

Linear belief function wikipedia , lookup

Transcript
Markov Chains
Sources:
Grinstead&Snell: Introduction to Probability
Hwei Hsu: Probability, Random Variables, & Random Processes
Motivation




One of the simplest forms of stochastic
dynamics
Allows to model stochastic temporal
dependencies
Deterministic dynamics as a special case
Applications in many areas
http://en.wikipedia.org/wiki/Snakes_and_Ladders
Milton Bradley „Chutes and Ladders“ game board c. 1952
Definition
Definition
Definition
●

System
inindifferent
states
X_n;
state
set:
E={0,
System
different
states
X_n;
state
set:
● System
in different states X_n; stateE={0,
set: E={0,
1,1,2,2,…}
can
be
finite
ororinfinite
…}
can
be
finite
infinite
1, 2, …} can be finite or infinite
“time”
is
discrete:
n=0,1,2,...;
there
are
“time”
is
discrete:
n=0,1,2,...;
there
are
● “time” is discrete: n=0,1,2,...; there are
probabilistictransitions
transitionsbetween
betweenstates.
states.
probabilistic
probabilistic transitions between states.
 Markov property:
● Markov
property:
● Markov property:
P  X n1= j∣X 0=i 0, X 1=i 1 ,... , X n =i=P  X n1 = j∣X n=i
P  X n1= j∣X 0=i 0, X 1=i 1 ,... , X n =i=P  X n1 = j∣X n
●
P  X n1= j∣X n =i
P  X n1= j∣X n =i
one-step transition probability
one-step
transition
probability
one-step
transition
probabil
Homogeneous Markov Chains
Homogeneous Markov Chains
If the transition probability does not depend on
If n,
thethen
transition
probability
not depend
on
the Markov
Chaindoes
possesses
stationary
n, then the Markov Chain possesses stationary
transition probabilities and the Markov Chain is
transition probabilities and the Markov Chain is
called
homogeneous.
called
homogeneous.
●

P  X n1= j∣X n =i

In
the
following,
● In
the
following,we
wewill
willonly
onlyconsider
consider
homogeneous
homogeneousMarkov
MarkovChains
Chains
Representation as Graph





The states of the Markov Chain are the nodes
The transitions correspond to directed edges
between the nodes
The edges are weighted with the transition
probability
The weights of outgoing edges have to sum to
one
Example: Weather in the Land of Oz (see black
board)
Generalization
to
higher
order
Generalization
to
higher
order
Generalization
to
higher
order
Generalization to higher order
nd nd
rd rd
nd
rd
Higher
order
Markov
processes,
e.g.
2
and
3
Higherorder
orderMarkov
Markovprocesses,
processes,e.g.
e.g.22
2nd and
and 33
3rd
Higher
Higher
order
Markov
processes,
e.g.
and
order:
order:
order:
order:
 ●●
●
P X
 Xn1
=j∣X
j∣Xn =i
, Xn−1
=in−1
,..., X
, X1=i
X0 =i
P
=
,
=i
,...
X

n1
n =i
nX
n−1
n−1
1 =i
1,
0 =i
0
n
1,
0
P  X n1= j∣X n =i n , X n−1=i n−1 ,... , X 1=i 1, X 0 =i 0 
or
or
or
or
=P X
 Xn1
=j∣X
j∣Xn =i
, Xn−1
=in−1

=P
=
,
=i

n1
n =i
nX
n−1
n−1
n
=P  X n1= j∣X n =i n , X n−1 =i n−1 
nd
nd2
order
2nd
order
nd
order
2 2order
3order
order
=P X
 Xn1
=j∣X
j∣Xn=i
,
X
=i
,
X
=i

3order
=P
=
,
X
=i
,
X
=i

3
n1
n =i
n
n−1
n−1
n−2
n−2
n−1=i n−1 , X n−2 =i n−2  3 order
=P  X n1= j∣X n=i nn , X n−1
n−1
n−2
n−2
rd
rd rd
rd
Inorder
order
topredict
predict
the
future,
alimited
limited
memory
ofthe
the
past
issufficient.
sufficient
the
future,
memory
past
InInorder
totopredict
the
future,
a alimited
memory
ofofthe
past
isissufficient.
In order to predict the future, a limited memory of the past is sufficient.
Transition Probability Matrix
TransitionProbability
ProbabilityMatrix
Matrix
Transition
Probability
Matrix
Transition
Probability
Matrix
Transition
pij =P  X n1 = j∣X n=i
Define
● Define
 Xn1
=
j∣Xn=i
Define ppijp=P
=P

X
=
j∣X
=i
Define
●● Define
ij =P
n1
n =i

X
=
j∣X
ij
n1
n
● and
● and
●
and
and
P=[ pij ]
● and
P=[
P=[
P=[
ppijijp]]ij ]
●
Conditions
on
the elements:
Conditions
on
the
elements:
Conditionson
onthe
theelements:
elements:
Conditions
on
the
elements:
Conditions

●
●●
●
pij 0 ∀ i , j
Matrices that fulfill these cond
0
∀∀
ij 0∀
ppijijp0
ii ,,ijj, j
Matrices that fulfill these conditions
∞
pij =1
∑
p
=1
∀
i
∑
p
=1
∀
i
ij
j=0
∑
p
ij=1 ∀ i
∑
ij
j=0
j=0
∞∞
∞
j=0
Matricesthat
that
fulfillthese
these
conditions
Matrices
that
fulfill
these
conditions
Arefulfill
called
Markov
matrix or
Matrices
conditions
Are
called
Markovmatrix
matrix
or
Are
called
Markov
matrix
or
Are
called
Markov
or
stochastic
matrix
Arei called Markov matrix or
∀
stochastic
matrix
stochasticmatrix
matrix
stochastic
stochastic matrix
a nice day. With this information we form a Markov chain as fol
s states
the kinds
of weather
Example:
Land
of Oz R, N, and S. From the above inform
ne the transition probabilities. These are most conveniently repres
e array as
R
N
S
0
1
R 1/2 1/4 1/4
P = N @ 1/2 0 1/2 A .
S 1/4 1/4 1/2
on Matrix
Question:
What
is
the
probability
that
it
is
snowy
s in the first row of the matrix P in Example 11.1 represent the p
two
days
from
now
when
it
is
rainy
today?
the various kinds of weather following a rainy day. Similarly, the en
ond Answer:
and third see
rowsblackboard
represent the probabilities for the various kin
llowing nice and snowy days, respectively. Such a square array is c
Higher order transition
Higher order transition
probabilities
Higher
order transition
probabilities
using matrix powers
Define transition matrix powers:
 Define
probabilities
transition matrix powers:
●
2
e transition
matrix powers:
P =PP
=PP
2-step transition
probabilities
p =∑ pik p kj
2
ij
k
=∑
pik p kjdoes
● Why
Why
doesthis
thissum
sumconverge
convergeeven
evenififstate
state
k
space EE isis infinite?
infinite? (see blackboard)
space
n1
n if state
n1
n
does
sum converge
even
 this
Generally:
● Generally:
P =P P , p ij =∑ p ik p kj
e E is infinite?
k

rally:
P
n1
n
=P P , p
n1
ij
=∑ p ik p
k
n
kj
n-step transition
probabilities
pad induction.
to lily pad with the appropriate transition probabilities.
CHAPTER 11. MARKOV CHAINS
2 408
According
to Kemeny,
Snell,
Thompson,
Land of
is
Theorem
11.1 Let
P and
be the
transitionthematrix
of Oz
a Markov
chain. The ijth en(n) not by good weather.
n
y things,
Theythe
never
have two nice
days
try pbut
of
the
matrix
P
gives
probability
that
the Markov chain, starting in
ij
ey have a nice day, they are just as likely to have snow as rain the
state s , will be in state sj after n steps.
ey have snowi or rain, they have an
even chance of having the same
f there is change from snow or rain, only half of the time is this a
Proof.
proof of this
is left as
an as
exercise
2
ce day.
With The
this information
wetheorem
form a Markov
chain
follows.(Exercise 17).
Rain Nice Snow
of of
Ozweather
example:
tes theLand
kinds
R, N, and S. From the above information
0
1
Rain
.500 .250 .250
P = Nice @ .500 .000 .500 A
he transition probabilities. These are most conveniently represented
Snow .250 .250 .500
ay as Example 11.2 (Example 11.1 continued) Consider again Rain
the Nice
weather
in the Land
Snow
0
1
Rain
.438 .188 .375
R that
N theSpowers of the transition matrix
@ .375give
of Oz. We know
interesting inNice
.250 us
.375 A
P
=
0
1
Snow .375 .188 .438
R 1/2the1/4
1/4 as it evolves. We shall be particularly interested in
formation about
process
Rain Nice Snow
0
1
P = N @ 1/2
0
1/2 A .
Rain
.406 .203 .391
the state of the chain after a large number of steps.P The
program MatrixPowers
= Nice @ .406 .188 .406 A
S 1/4 1/4 1/2
Snow
.391 .203 .406
1
2
3
computes the powers of P.
Rain Nice Snow
0
1
2
Rain
.402
.199
.398
We have run the program MatrixPowers for the
Land
of
Oz
example
to comP = Nice @ .398 .203 .398 A
Snow are
.398 shown
.199 .402 in Table 11.1.
pute the successive powers of P from 1 to 6. The results
Rain Nice Snow
0
1
Matrix
We note that after six days our weather predictions are,
to
three-decimal-place
acRain
.400
.200 .399
@
A
.400 .199 .400
P = Nice
Snow .399for
.200 the
.400 three types of
curacy,
independent
of
today’s
weather.
The
probabilities
the first row of the matrix P in Example 11.1 represent the proba- Rain Nice Snow
0 the chain1started. This
weather,
N, and
S, area.4,
.2, day.
andSimilarly,
.4 no matter
where
Rain
.400 .200 .400
various
kinds of R,
weather
following
rainy
the entries
P = Nice @ .400 .200 .400 A
is anrows
example
of athe
type
of Markovforchain
called kinds
a regular
Markov
Snow .400
.200 .400chain. For this
and third
represent
probabilities
the various
of
type
chain,days,
it isrespectively.
true that long-range
predictions
are
independent of the starting
ng nice
andofsnowy
Such a square
array
called
Tableis
11.1:
Powers of the Land of Oz transition matrix.
ransition
probabilities,
or the are
transition
matrix
state.
Not all chains
regular,
but. this is an important class of chains that we
4
5
6
the question of determining the probability that, given the chain is
Aside: Chapman-Kolmogorov
Chapman-Kolmogorov
Equation
Chapman-Kolmogorov
Equation
Equation
●
●
With
these
definitions
we
can
show:
With these definitions we can show:
n
n
ij
ij
n
n
ij
ij
=P  X
X n=
= j∣X
j∣X 0=i
=i
pp =P
n
0
=P  X
X mn=
= j∣X
j∣X m=i
=i
pp =P
mn
m
●
●
Furthermore:
Furthermore:
Furthermore:
nm
n m
nm
n
m
P
=P
P
P =P P

=
p
p
∑
=∑
p
p
k
nm
nm
ij
ij
p
p
k
 n
 n
ik
ik
m 
m 
kj
kj
Chapman-Kolmogorov equation:
Chapman-Kolmogorov
equation:
Chapman-Kolmogorov
equation:
TheThe
probability
to
transition
from
i to i to
probability
to transition
from
The
probability
to
transition
from
i
to the
j in n+m
steps
cancan
be expressed
as
j
in
n+m
steps
be
expressed
as the
jsum
in n+m
steps
can
be
expressed
as
the
of the
probabilities
of the
paths
going
sum
of the
probabilities
of the
paths
going
sum
of
the
probabilities
of
the
paths
going
via each
intermediate
state
k after
n steps.
via each
intermediate
state
k after
n steps.
via each intermediate state k after n steps.
409
11.1. INTRODUCTION
409
11.1. INTRODUCTION
Theorem 11.2 Let P be the transition matrix of a Markov chain, and let u be the
Theorem
11.2which
Let Prepresents
be the transition
matrix of
a Markov chain,
andthe
let probability
u be the
probability
vector
the starting
distribution.
Then
probability
which
the starting
distribution.
Then
the probability
that the
chain isvector
in state
si represents
after n steps
is the ith
entry in the
vector
that the chain is in state si after n steps is the ith entry in the vector
n
u(n)
(n)= uP n .
u
= uP .
Proof.
The The
proof
of this
theorem
(Exercise18).
18).
Proof.
proof
of this
theoremisisleft
leftas
as an
an exercise
exercise (Exercise
2 2
We note
if we
want
examinethe
thebehavior
behavior of
thethe
assumpWe note
thatthat
if we
want
to to
examine
ofthe
thechain
chainunder
under
assumption that
it starts
a certainstate
statesis,i ,we
we simply
simply choose
probability
tion that
it starts
in ain certain
chooseuutotobebethethe
probability
ith entry
equal
andall
allother
other entries
entries equal
vectorvector
withwith
ith entry
equal
to to
1 1and
equaltoto0.0.
Example: consider starting in states R, N, S with probabilty 1/3 each:
Example 11.3 In the Land of Oz example (Example 11.1) let the initial probability
Example 11.3 In the Land of Oz example (Example 11.1) let the initial probability
vector u equal (1/3, 1/3, 1/3). Then we can calculate the distribution of the states
vectorafter
u equal
(1/3,using
1/3, Theorem
1/3). Then
the distribution
of the
states
three days
11.2we
andcan
ourcalculate
previous calculation
of P3 . 3We
obtain
after three days using Theorem 11.2 and our previous
calculation of1P . We obtain
0
.406 .203 .391 1
0
A
.406 .188
.203 .406
.391
u(3) = uP3 = ( 1/3, 1/3, 1/3 ) @ .406
u(3) = uP3 = ( 1/3, 1/3, 1/3 ) @ .391
.406 .203
.188 .406
.406 A
= ( .401, .198, .401 ) ..391
= ( .401, .198, .401 ) .
.203 .406
2
Classification of states




Accessible states: j is accessible from i if for
some n>=0, p_ij^(n)>0, and we write i → j
Two states i and j accessible to each other are
said to communicate, and we write i <-> j
If all states communicate with each other, the
Markov chain is said to be irreducible (ergodic).
A state j is said to be an absorbing state if
p_jj=1. Once j is reached, it is never left.
Absorbing Markov Chains



A Markov chain is called absorbing if it has at
least one absorbing state and if that state can
be reached from every other state (not
necessarily in one step).
An absorbing Markov chain is obviously not
irreducible. (why?)
Can you construct a Markov Chain that has no
absorbing state but is not irreducible? (see
blackboard)
Stationary
StationaryDistribution
Distribution



Consider
Markov
● Consider
Markovchain
chainwith
withtransition
transition probability
probability matrix P. A probability vector is
matrix P. A probability vector is called a
called a stationary distribution of the Markov
stationary
distribution
of
the
Markov
chain
if:
chain if:
p P= p
● I.e. it is a left Eigenvector of P. Note that a
I.e.
it is a left Eigenvector of P. Note that a
multipleofofthis
thisvector
vectorisisalso
alsoan
anEigenvector
Eigenvector but
but
multiple
not
necessarily
not
necessarilya aprobability
probabilityvector.
vector.
● Note:
Note:
if ifthis
thisisisthe
theinitial
initialstate
statedistribution,
distribution,then
then all
alllater
laterstate
statedistributions
distributionswill
will
identical
bebe
identical
to to
it. it.
Regular Markov Chain




A Markov Chain is regular if there is a finite
positive integer m such that after m time-steps,
every state has a non-zero chance of being
occupied, no matter what the initial state.
Sufficient condition: all elements in P are
greater than zero.
Give an example of an irreducible (ergodic)
Markov Chain that is not regular. (see
blackboard)
Give an example of a regular Markov Chain that
has an entry which is zero. (see blackboard)
1
0
power of the transition matrix PPis,=to three decimal places,
1/2 1/2
R N S
be the transition matrix of a Markov0chain. Then1all powers of P will have a 0 in
R .4 .2 .4
the upper right-hand corner. 6
P = N @ .4 .2 .4 A .
We shall now discuss two important theorems relating to regular chains.
S .4 .2 .4
Limiting Distribution for regular
Markov Chain
Thus, to this
degree
the probability
of rain
six days
afterThen,
a rainy
Theorem
11.7
Let of
P accuracy,
be the transition
matrix for
a regular
chain.
as day
n!
n probability of rain six days after a nice day, or six days after
is the
as P
the
1,
thesame
powers
approach a limiting matrix W with all rows the same vector w.
a snowy
day.
Theorem
11.7
predicts
that, for large
the the
rowscomponents
of P approach
The
vector
w is
a strictly
positive
probability
vectorn,(i.e.,
are aall
commonand
vector.
is interesting
22
positive
theyItsum
to one). that this occurs so soon in our example.
In the next
we agive
two transition
proofs of matrix,
this fundamental
theorem. We give
Theorem
11.8section
Let P be
regular
let
here the basic idea of the first proof.
n Pn ,
W = Plim
We want to show that the powers
of a regular transition matrix tend to a
n!1
matrix with all rows the same. This is the same as showing that Pn converges to
w be with
the common
of W,Now
and the
let jth
c becolumn
the column
of whose
aletmatrix
constant row
columns.
of Pn vector
is Pn yall
where
y is a
components
arewith
1. Then
column
vector
1 in the jth entry and 0 in the other entries. Thus we need only
n
prove
that
for
any
column
vector
y,
P
y approaches
constant
vector
as n tend
(a) wP = w, and any row vector v such
that vP = va is
a constant
multiple
of w.to
infinity.
(b)Since
Pc =each
c, and
vector x such
that Px
x is a multiple
of c.
rowany
of column
P is a probability
vector,
Py=replaces
y by averages
of its
components. Here is an example:
Proof, e.g., in Grinstead&Snell.
0
10 1
0
1
0
1
We will now give several di↵erent methods for calculating the fixed row vector
w for a regular Markov chain.
Example 11.19 By Theorem 11.7 we can find the limiting vector w for the Land
of Oz from the fact that
w1 + w2 + w3 = 1
and
( w1
w2
0
1/2
w3 ) @ 1/2
1/4
1/4
0
1/4
1
1/4
1/2 A = ( w1
1/2
w2
w3 ) .
These relations lead to the following four equations in three unknowns:
w1 + w2 + w3
=
1,
(1/2)w1 + (1/2)w2 + (1/4)w3
=
w1 ,
(1/4)w1 + (1/4)w3
=
w2 ,
(1/4)w1 + (1/2)w2 + (1/2)w3
=
w3 .
Our theorem guarantees that these equations have a unique solution. If the
equations are solved, we obtain the solution
w = ( .4
.2
.4 ) ,
in agreement with that predicted from P6 , given in Example 11.2.
2
To calculate the fixed vector, we can assume that the value at a particular state,
Simulation
We illustrate Theorem 11.12 by writing a program to simulate the behavior of a
Markov chain. SimulateChain is such a program.
Example 11.21 In the Land of Oz, there are 525 days in a year. We have simulated
the weather for one year in the Land of Oz, using the program SimulateChain.
The results are shown in Table 11.2.
SSRNRNSSSSSSNRSNSSRNSRNSSSNSRRRNSSSNRRSSSSNRSSNSRRRRRRNSSS
SSRRRSNSNRRRRSRSRNSNSRRNRRNRSSNSRNRNSSRRSRNSSSNRSRRSSNRSNR
RNSSSSNSSNSRSRRNSSNSSRNSSRRNRRRSRNRRRNSSSNRNSRNSNRNRSSSRSS
NRSSSNSSSSSSNSSSNSNSRRNRNRRRRSRRRSSSSNRRSSSSRSRRRNRRRSSSSR
RNRRRSRSSRRRRSSRNRRRRRRNSSRNRSSSNRNSNRRRRNRRRNRSNRRNSRRSNR
RRRSSSRNRRRNSNSSSSSRRRRSRNRSSRRRRSSSRRRNRNRRRSRSRNSNSSRRRR
RNSNRNSNRRNRRRRRRSSSNRSSRSNRSSSNSNRNSNSSSNRRSRRRNRRRRNRNRS
SSNSRSNRNRRSNRRNSRSSSRNSRRSSNSRRRNRRSNRRNSSSSSNRNSSSSSSSNR
NSRRRNSSRRRNSSSNRRSRNSSRRNRRNRSNRRRRRRRRRNSNRRRRRNSRRSSSSN
SNS
State
R
N
S
Times
Fraction
217
109
199
.413
.208
.379
Table 11.2: Weather in the Land of Oz.
Other “Markov” things






Hidden Markov Models (HMMs) (Pattern
Recognition)
Markov Decision Processes (MDPs) (Machine
Learning)
Partially observable Markov Decision
Processes (POMDPs) (Machine Learning)
Graphical Models (Machine Learning)
Markov Random Fields (Computer Vision)
Markov Chain Monte Carlo (MCMC) (Statistical
Estimation & Inference)
Markov Decision Process




Now consider an agent acting in an
environment.
There are states as before, but now also
actions that influence what the next state will
be.
There is also a reward function, that represents
numerical rewards or punishments
How should the agent behave to maximize the
reward signal? -> field of reinforcement learning
The Agent-Environment Interface
The Agent-Environment Interface
Agent
Agent
reward
state
r
reward
state
st
rt
st
action
at
t
rt+1
st+1
rt+1
st+1
action
at
Environment
Environment
Agent and environment interact at discrete time steps : t
Agent observes state at step t :
st
0,1, 2,
S
Agent andproduces
environment
interact
time steps : t
action at step
t : at at
A(sdiscrete
t)
resulting state
rewardat
: step
rt 1 t :
Agentgets
observes
and resulting next state: st
1
produces action at step t : at
...
gets
rt +1
st +1
resulting
: a rt
st a reward
t +1
t
rt +2
1
and resulting next state: st
R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction
...
st
at
rt +1
st +1
at +1
st
0,1, 2,
S
A(st )
st +2
at +2
rt +3 s
t +3
...
at +3
1
rt +2
2
st +2
at +2
rt +3 s
t +3
...
at +3
Graph representation
Recycling Robot MDP
S
high, low
A(high)
A(low)
Rsearch
R wait
search, wait
expected no. of cans while searching
expected no. of cans while waiting
Rsearch
search, wait, recharge
1, R wait
1—
β , —3
β , R search
search
wait
high
1, 0
recharge
search
R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction
low
wait
search
α, R
R wait
1—
α , R search
1, R wait
15
Solution Methods
Diverse approaches:
Dynamic Programming, Monte Carlo Methods,
Temporal Difference Learning, Direct Policy
Search, ...
Big research area at the intersection of: Artificial
Intelligence, Control Theory, Psychology,
Neuroscience