Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Markov Chains Sources: Grinstead&Snell: Introduction to Probability Hwei Hsu: Probability, Random Variables, & Random Processes Motivation One of the simplest forms of stochastic dynamics Allows to model stochastic temporal dependencies Deterministic dynamics as a special case Applications in many areas http://en.wikipedia.org/wiki/Snakes_and_Ladders Milton Bradley „Chutes and Ladders“ game board c. 1952 Definition Definition Definition ● System inindifferent states X_n; state set: E={0, System different states X_n; state set: ● System in different states X_n; stateE={0, set: E={0, 1,1,2,2,…} can be finite ororinfinite …} can be finite infinite 1, 2, …} can be finite or infinite “time” is discrete: n=0,1,2,...; there are “time” is discrete: n=0,1,2,...; there are ● “time” is discrete: n=0,1,2,...; there are probabilistictransitions transitionsbetween betweenstates. states. probabilistic probabilistic transitions between states. Markov property: ● Markov property: ● Markov property: P X n1= j∣X 0=i 0, X 1=i 1 ,... , X n =i=P X n1 = j∣X n=i P X n1= j∣X 0=i 0, X 1=i 1 ,... , X n =i=P X n1 = j∣X n ● P X n1= j∣X n =i P X n1= j∣X n =i one-step transition probability one-step transition probability one-step transition probabil Homogeneous Markov Chains Homogeneous Markov Chains If the transition probability does not depend on If n, thethen transition probability not depend on the Markov Chaindoes possesses stationary n, then the Markov Chain possesses stationary transition probabilities and the Markov Chain is transition probabilities and the Markov Chain is called homogeneous. called homogeneous. ● P X n1= j∣X n =i In the following, ● In the following,we wewill willonly onlyconsider consider homogeneous homogeneousMarkov MarkovChains Chains Representation as Graph The states of the Markov Chain are the nodes The transitions correspond to directed edges between the nodes The edges are weighted with the transition probability The weights of outgoing edges have to sum to one Example: Weather in the Land of Oz (see black board) Generalization to higher order Generalization to higher order Generalization to higher order Generalization to higher order nd nd rd rd nd rd Higher order Markov processes, e.g. 2 and 3 Higherorder orderMarkov Markovprocesses, processes,e.g. e.g.22 2nd and and 33 3rd Higher Higher order Markov processes, e.g. and order: order: order: order: ●● ● P X Xn1 =j∣X j∣Xn =i , Xn−1 =in−1 ,..., X , X1=i X0 =i P = , =i ,... X n1 n =i nX n−1 n−1 1 =i 1, 0 =i 0 n 1, 0 P X n1= j∣X n =i n , X n−1=i n−1 ,... , X 1=i 1, X 0 =i 0 or or or or =P X Xn1 =j∣X j∣Xn =i , Xn−1 =in−1 =P = , =i n1 n =i nX n−1 n−1 n =P X n1= j∣X n =i n , X n−1 =i n−1 nd nd2 order 2nd order nd order 2 2order 3order order =P X Xn1 =j∣X j∣Xn=i , X =i , X =i 3order =P = , X =i , X =i 3 n1 n =i n n−1 n−1 n−2 n−2 n−1=i n−1 , X n−2 =i n−2 3 order =P X n1= j∣X n=i nn , X n−1 n−1 n−2 n−2 rd rd rd rd Inorder order topredict predict the future, alimited limited memory ofthe the past issufficient. sufficient the future, memory past InInorder totopredict the future, a alimited memory ofofthe past isissufficient. In order to predict the future, a limited memory of the past is sufficient. Transition Probability Matrix TransitionProbability ProbabilityMatrix Matrix Transition Probability Matrix Transition Probability Matrix Transition pij =P X n1 = j∣X n=i Define ● Define Xn1 = j∣Xn=i Define ppijp=P =P X = j∣X =i Define ●● Define ij =P n1 n =i X = j∣X ij n1 n ● and ● and ● and and P=[ pij ] ● and P=[ P=[ P=[ ppijijp]]ij ] ● Conditions on the elements: Conditions on the elements: Conditionson onthe theelements: elements: Conditions on the elements: Conditions ● ●● ● pij 0 ∀ i , j Matrices that fulfill these cond 0 ∀∀ ij 0∀ ppijijp0 ii ,,ijj, j Matrices that fulfill these conditions ∞ pij =1 ∑ p =1 ∀ i ∑ p =1 ∀ i ij j=0 ∑ p ij=1 ∀ i ∑ ij j=0 j=0 ∞∞ ∞ j=0 Matricesthat that fulfillthese these conditions Matrices that fulfill these conditions Arefulfill called Markov matrix or Matrices conditions Are called Markovmatrix matrix or Are called Markov matrix or Are called Markov or stochastic matrix Arei called Markov matrix or ∀ stochastic matrix stochasticmatrix matrix stochastic stochastic matrix a nice day. With this information we form a Markov chain as fol s states the kinds of weather Example: Land of Oz R, N, and S. From the above inform ne the transition probabilities. These are most conveniently repres e array as R N S 0 1 R 1/2 1/4 1/4 P = N @ 1/2 0 1/2 A . S 1/4 1/4 1/2 on Matrix Question: What is the probability that it is snowy s in the first row of the matrix P in Example 11.1 represent the p two days from now when it is rainy today? the various kinds of weather following a rainy day. Similarly, the en ond Answer: and third see rowsblackboard represent the probabilities for the various kin llowing nice and snowy days, respectively. Such a square array is c Higher order transition Higher order transition probabilities Higher order transition probabilities using matrix powers Define transition matrix powers: Define probabilities transition matrix powers: ● 2 e transition matrix powers: P =PP =PP 2-step transition probabilities p =∑ pik p kj 2 ij k =∑ pik p kjdoes ● Why Why doesthis thissum sumconverge convergeeven evenififstate state k space EE isis infinite? infinite? (see blackboard) space n1 n if state n1 n does sum converge even this Generally: ● Generally: P =P P , p ij =∑ p ik p kj e E is infinite? k rally: P n1 n =P P , p n1 ij =∑ p ik p k n kj n-step transition probabilities pad induction. to lily pad with the appropriate transition probabilities. CHAPTER 11. MARKOV CHAINS 2 408 According to Kemeny, Snell, Thompson, Land of is Theorem 11.1 Let P and be the transitionthematrix of Oz a Markov chain. The ijth en(n) not by good weather. n y things, Theythe never have two nice days try pbut of the matrix P gives probability that the Markov chain, starting in ij ey have a nice day, they are just as likely to have snow as rain the state s , will be in state sj after n steps. ey have snowi or rain, they have an even chance of having the same f there is change from snow or rain, only half of the time is this a Proof. proof of this is left as an as exercise 2 ce day. With The this information wetheorem form a Markov chain follows.(Exercise 17). Rain Nice Snow of of Ozweather example: tes theLand kinds R, N, and S. From the above information 0 1 Rain .500 .250 .250 P = Nice @ .500 .000 .500 A he transition probabilities. These are most conveniently represented Snow .250 .250 .500 ay as Example 11.2 (Example 11.1 continued) Consider again Rain the Nice weather in the Land Snow 0 1 Rain .438 .188 .375 R that N theSpowers of the transition matrix @ .375give of Oz. We know interesting inNice .250 us .375 A P = 0 1 Snow .375 .188 .438 R 1/2the1/4 1/4 as it evolves. We shall be particularly interested in formation about process Rain Nice Snow 0 1 P = N @ 1/2 0 1/2 A . Rain .406 .203 .391 the state of the chain after a large number of steps.P The program MatrixPowers = Nice @ .406 .188 .406 A S 1/4 1/4 1/2 Snow .391 .203 .406 1 2 3 computes the powers of P. Rain Nice Snow 0 1 2 Rain .402 .199 .398 We have run the program MatrixPowers for the Land of Oz example to comP = Nice @ .398 .203 .398 A Snow are .398 shown .199 .402 in Table 11.1. pute the successive powers of P from 1 to 6. The results Rain Nice Snow 0 1 Matrix We note that after six days our weather predictions are, to three-decimal-place acRain .400 .200 .399 @ A .400 .199 .400 P = Nice Snow .399for .200 the .400 three types of curacy, independent of today’s weather. The probabilities the first row of the matrix P in Example 11.1 represent the proba- Rain Nice Snow 0 the chain1started. This weather, N, and S, area.4, .2, day. andSimilarly, .4 no matter where Rain .400 .200 .400 various kinds of R, weather following rainy the entries P = Nice @ .400 .200 .400 A is anrows example of athe type of Markovforchain called kinds a regular Markov Snow .400 .200 .400chain. For this and third represent probabilities the various of type chain,days, it isrespectively. true that long-range predictions are independent of the starting ng nice andofsnowy Such a square array called Tableis 11.1: Powers of the Land of Oz transition matrix. ransition probabilities, or the are transition matrix state. Not all chains regular, but. this is an important class of chains that we 4 5 6 the question of determining the probability that, given the chain is Aside: Chapman-Kolmogorov Chapman-Kolmogorov Equation Chapman-Kolmogorov Equation Equation ● ● With these definitions we can show: With these definitions we can show: n n ij ij n n ij ij =P X X n= = j∣X j∣X 0=i =i pp =P n 0 =P X X mn= = j∣X j∣X m=i =i pp =P mn m ● ● Furthermore: Furthermore: Furthermore: nm n m nm n m P =P P P =P P = p p ∑ =∑ p p k nm nm ij ij p p k n n ik ik m m kj kj Chapman-Kolmogorov equation: Chapman-Kolmogorov equation: Chapman-Kolmogorov equation: TheThe probability to transition from i to i to probability to transition from The probability to transition from i to the j in n+m steps cancan be expressed as j in n+m steps be expressed as the jsum in n+m steps can be expressed as the of the probabilities of the paths going sum of the probabilities of the paths going sum of the probabilities of the paths going via each intermediate state k after n steps. via each intermediate state k after n steps. via each intermediate state k after n steps. 409 11.1. INTRODUCTION 409 11.1. INTRODUCTION Theorem 11.2 Let P be the transition matrix of a Markov chain, and let u be the Theorem 11.2which Let Prepresents be the transition matrix of a Markov chain, andthe let probability u be the probability vector the starting distribution. Then probability which the starting distribution. Then the probability that the chain isvector in state si represents after n steps is the ith entry in the vector that the chain is in state si after n steps is the ith entry in the vector n u(n) (n)= uP n . u = uP . Proof. The The proof of this theorem (Exercise18). 18). Proof. proof of this theoremisisleft leftas as an an exercise exercise (Exercise 2 2 We note if we want examinethe thebehavior behavior of thethe assumpWe note thatthat if we want to to examine ofthe thechain chainunder under assumption that it starts a certainstate statesis,i ,we we simply simply choose probability tion that it starts in ain certain chooseuutotobebethethe probability ith entry equal andall allother other entries entries equal vectorvector withwith ith entry equal to to 1 1and equaltoto0.0. Example: consider starting in states R, N, S with probabilty 1/3 each: Example 11.3 In the Land of Oz example (Example 11.1) let the initial probability Example 11.3 In the Land of Oz example (Example 11.1) let the initial probability vector u equal (1/3, 1/3, 1/3). Then we can calculate the distribution of the states vectorafter u equal (1/3,using 1/3, Theorem 1/3). Then the distribution of the states three days 11.2we andcan ourcalculate previous calculation of P3 . 3We obtain after three days using Theorem 11.2 and our previous calculation of1P . We obtain 0 .406 .203 .391 1 0 A .406 .188 .203 .406 .391 u(3) = uP3 = ( 1/3, 1/3, 1/3 ) @ .406 u(3) = uP3 = ( 1/3, 1/3, 1/3 ) @ .391 .406 .203 .188 .406 .406 A = ( .401, .198, .401 ) ..391 = ( .401, .198, .401 ) . .203 .406 2 Classification of states Accessible states: j is accessible from i if for some n>=0, p_ij^(n)>0, and we write i → j Two states i and j accessible to each other are said to communicate, and we write i <-> j If all states communicate with each other, the Markov chain is said to be irreducible (ergodic). A state j is said to be an absorbing state if p_jj=1. Once j is reached, it is never left. Absorbing Markov Chains A Markov chain is called absorbing if it has at least one absorbing state and if that state can be reached from every other state (not necessarily in one step). An absorbing Markov chain is obviously not irreducible. (why?) Can you construct a Markov Chain that has no absorbing state but is not irreducible? (see blackboard) Stationary StationaryDistribution Distribution Consider Markov ● Consider Markovchain chainwith withtransition transition probability probability matrix P. A probability vector is matrix P. A probability vector is called a called a stationary distribution of the Markov stationary distribution of the Markov chain if: chain if: p P= p ● I.e. it is a left Eigenvector of P. Note that a I.e. it is a left Eigenvector of P. Note that a multipleofofthis thisvector vectorisisalso alsoan anEigenvector Eigenvector but but multiple not necessarily not necessarilya aprobability probabilityvector. vector. ● Note: Note: if ifthis thisisisthe theinitial initialstate statedistribution, distribution,then then all alllater laterstate statedistributions distributionswill will identical bebe identical to to it. it. Regular Markov Chain A Markov Chain is regular if there is a finite positive integer m such that after m time-steps, every state has a non-zero chance of being occupied, no matter what the initial state. Sufficient condition: all elements in P are greater than zero. Give an example of an irreducible (ergodic) Markov Chain that is not regular. (see blackboard) Give an example of a regular Markov Chain that has an entry which is zero. (see blackboard) 1 0 power of the transition matrix PPis,=to three decimal places, 1/2 1/2 R N S be the transition matrix of a Markov0chain. Then1all powers of P will have a 0 in R .4 .2 .4 the upper right-hand corner. 6 P = N @ .4 .2 .4 A . We shall now discuss two important theorems relating to regular chains. S .4 .2 .4 Limiting Distribution for regular Markov Chain Thus, to this degree the probability of rain six days afterThen, a rainy Theorem 11.7 Let of P accuracy, be the transition matrix for a regular chain. as day n! n probability of rain six days after a nice day, or six days after is the as P the 1, thesame powers approach a limiting matrix W with all rows the same vector w. a snowy day. Theorem 11.7 predicts that, for large the the rowscomponents of P approach The vector w is a strictly positive probability vectorn,(i.e., are aall commonand vector. is interesting 22 positive theyItsum to one). that this occurs so soon in our example. In the next we agive two transition proofs of matrix, this fundamental theorem. We give Theorem 11.8section Let P be regular let here the basic idea of the first proof. n Pn , W = Plim We want to show that the powers of a regular transition matrix tend to a n!1 matrix with all rows the same. This is the same as showing that Pn converges to w be with the common of W,Now and the let jth c becolumn the column of whose aletmatrix constant row columns. of Pn vector is Pn yall where y is a components arewith 1. Then column vector 1 in the jth entry and 0 in the other entries. Thus we need only n prove that for any column vector y, P y approaches constant vector as n tend (a) wP = w, and any row vector v such that vP = va is a constant multiple of w.to infinity. (b)Since Pc =each c, and vector x such that Px x is a multiple of c. rowany of column P is a probability vector, Py=replaces y by averages of its components. Here is an example: Proof, e.g., in Grinstead&Snell. 0 10 1 0 1 0 1 We will now give several di↵erent methods for calculating the fixed row vector w for a regular Markov chain. Example 11.19 By Theorem 11.7 we can find the limiting vector w for the Land of Oz from the fact that w1 + w2 + w3 = 1 and ( w1 w2 0 1/2 w3 ) @ 1/2 1/4 1/4 0 1/4 1 1/4 1/2 A = ( w1 1/2 w2 w3 ) . These relations lead to the following four equations in three unknowns: w1 + w2 + w3 = 1, (1/2)w1 + (1/2)w2 + (1/4)w3 = w1 , (1/4)w1 + (1/4)w3 = w2 , (1/4)w1 + (1/2)w2 + (1/2)w3 = w3 . Our theorem guarantees that these equations have a unique solution. If the equations are solved, we obtain the solution w = ( .4 .2 .4 ) , in agreement with that predicted from P6 , given in Example 11.2. 2 To calculate the fixed vector, we can assume that the value at a particular state, Simulation We illustrate Theorem 11.12 by writing a program to simulate the behavior of a Markov chain. SimulateChain is such a program. Example 11.21 In the Land of Oz, there are 525 days in a year. We have simulated the weather for one year in the Land of Oz, using the program SimulateChain. The results are shown in Table 11.2. SSRNRNSSSSSSNRSNSSRNSRNSSSNSRRRNSSSNRRSSSSNRSSNSRRRRRRNSSS SSRRRSNSNRRRRSRSRNSNSRRNRRNRSSNSRNRNSSRRSRNSSSNRSRRSSNRSNR RNSSSSNSSNSRSRRNSSNSSRNSSRRNRRRSRNRRRNSSSNRNSRNSNRNRSSSRSS NRSSSNSSSSSSNSSSNSNSRRNRNRRRRSRRRSSSSNRRSSSSRSRRRNRRRSSSSR RNRRRSRSSRRRRSSRNRRRRRRNSSRNRSSSNRNSNRRRRNRRRNRSNRRNSRRSNR RRRSSSRNRRRNSNSSSSSRRRRSRNRSSRRRRSSSRRRNRNRRRSRSRNSNSSRRRR RNSNRNSNRRNRRRRRRSSSNRSSRSNRSSSNSNRNSNSSSNRRSRRRNRRRRNRNRS SSNSRSNRNRRSNRRNSRSSSRNSRRSSNSRRRNRRSNRRNSSSSSNRNSSSSSSSNR NSRRRNSSRRRNSSSNRRSRNSSRRNRRNRSNRRRRRRRRRNSNRRRRRNSRRSSSSN SNS State R N S Times Fraction 217 109 199 .413 .208 .379 Table 11.2: Weather in the Land of Oz. Other “Markov” things Hidden Markov Models (HMMs) (Pattern Recognition) Markov Decision Processes (MDPs) (Machine Learning) Partially observable Markov Decision Processes (POMDPs) (Machine Learning) Graphical Models (Machine Learning) Markov Random Fields (Computer Vision) Markov Chain Monte Carlo (MCMC) (Statistical Estimation & Inference) Markov Decision Process Now consider an agent acting in an environment. There are states as before, but now also actions that influence what the next state will be. There is also a reward function, that represents numerical rewards or punishments How should the agent behave to maximize the reward signal? -> field of reinforcement learning The Agent-Environment Interface The Agent-Environment Interface Agent Agent reward state r reward state st rt st action at t rt+1 st+1 rt+1 st+1 action at Environment Environment Agent and environment interact at discrete time steps : t Agent observes state at step t : st 0,1, 2, S Agent andproduces environment interact time steps : t action at step t : at at A(sdiscrete t) resulting state rewardat : step rt 1 t : Agentgets observes and resulting next state: st 1 produces action at step t : at ... gets rt +1 st +1 resulting : a rt st a reward t +1 t rt +2 1 and resulting next state: st R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction ... st at rt +1 st +1 at +1 st 0,1, 2, S A(st ) st +2 at +2 rt +3 s t +3 ... at +3 1 rt +2 2 st +2 at +2 rt +3 s t +3 ... at +3 Graph representation Recycling Robot MDP S high, low A(high) A(low) Rsearch R wait search, wait expected no. of cans while searching expected no. of cans while waiting Rsearch search, wait, recharge 1, R wait 1— β , —3 β , R search search wait high 1, 0 recharge search R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction low wait search α, R R wait 1— α , R search 1, R wait 15 Solution Methods Diverse approaches: Dynamic Programming, Monte Carlo Methods, Temporal Difference Learning, Direct Policy Search, ... Big research area at the intersection of: Artificial Intelligence, Control Theory, Psychology, Neuroscience