Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
MATH 56A SPRING 2008 STOCHASTIC PROCESSES 31 1.3. Invariant probability distribution. Definition 1.4. A probability distribution is a function π : S → [0, 1] from the set of states S to the closed unit interval [0, 1] so that ! π(i) = 1. i∈S is: When the set of states is equal to S = {1, 2, · · · , s} then the condition s ! π(i) = 1. i=1 Definition 1.5. A probability distribution π is called invariant if πP = P. I.e., π is a left eigenvector for P with eigenvalue 1. 1.3.1. probability distribution of Xn . Each Xn has a probability distribution. I used the following example to illustrate this. 1/3 1 2 • • 1/4 The numbers 1/3 and 1/4 are transition probabilities. They say nothing about X0 . But we need to start in a random state X0 . This is because we need to understand how the transition from one random state to another works so that we can go from Xn to Xn+1 . X0 will be equal to either 0 or 1 with probability: α1 = P(X0 = 1), α2 = P(X0 = 2). These two numbers are between 0 and 1 (inclusive) and add up to 1: α1 + α2 = 1. So, α = (α1 , α2 ) is a probability distribution. α is the probability distribution of X0 and is called the initial (probability) distribution. Once the distribution of X0 is given, the probability distribution of every Xn is determined by the transition matrix: Theorem 1.6. The probability distribution of Xn is the vector αP n . 32 FINITE MARKOV CHAINS So, in the example, P(X2 = 2) = 2 ! 2 ! i=1 j=1 = P(X0 = i) P(X1 = j | X0 = i) P(X2 = 2 | X1 = j) " #$ % " #$ %" #$ % αi ! p(i,j) p(j,2) αi p(i, j)p(j, 2) = (αP 2 )2 . i,j This is the sum of the probabilities of all possible ways that you can end up at state 2 at time 2. To prove this in general, I used the following probability formula: Lemma 1.7. Suppose that our sample space is a disjoint union & Ω= Bi of events Bi . Then P(A) = (1.1) ! i P(Bi )P(A | Bi ) I drew this picture to illustrate this basic concept that you should already know. Proof of theorem 1.6. By induction on n. If n = 0 then P n = I is the identity matrix. So, αP n = αP 0 = αI = α. This is the distribution of Xn = X0 by definition. So, the theorem holds for n = 0. Suppose the theorem holds for n. Then, by Equation (1.1) P(Xn+1 = 1) = P(Xn = 1) P(Xn+1 = 1 | Xn = 1) +P(Xn = 2) P(Xn+1 = 1 | Xn = 2) " #$ % " " #$ % " #$ % #$ % B1 p(1,1) B2 ' p(2,1) = (αP n )1 p(1, 1) + (αP n )2 p(2, 1) = [(αP n )P ]1 = αP n+1 ( 1 . MATH 56A SPRING 2008 STOCHASTIC PROCESSES 33 And similarly, P(Xn+1 = 2) = (αP n+1 )2 . So, the theorem holds for n + 1. So, it holds for all n ≥ 0. ! Corollary 1.8. If the initial distribution α = π is invariant, then Xn has probability distribution π for all n. Proof. The distribution of Xn is αP n = πP n = "#$% πP P · · · P = π =π ! since every time you multiply P you always have π. 1.3.2. Perron-Frobenius Theorem. I stated this very important theorem without proof. However, the proof is outlined in Exercise 1.20 in the book. Theorem 1.9 (Perron-Frobenius). Suppose that A is a square matrix all of whose entries are positive real numbers. Then, A has a left eigenvector π, all of whose coordinates are positive real numbers. I.e., πP = λπ. Furthermore, (a) π is unique up to a scalar multiple. (If α is another left eigenvector of A with positive real entries then α = Cπ for some scalar C.) (b) λ1 = λ is a positive real number. (c) The eigenvector λ1 is larger in absolute value then any other eigenvalue of A: |λ2 |, |λ3 |, · · · < λ1 . (So, λ1 is called the maximal eigenvalue of A.) (d) π π 1 lim n P n = ... n→∞ λ1 π assuming that π is a probability distribution, i.e., / πi = 1. I didn’t prove this. However, I tried to explain the last statement. When we raise P to the power n, it tends to look like multiplication by λn1 . So, we should divide by λn1 . If we know that the rows of the 34 matrix FINITE MARKOV CHAINS 1 Pn λn 1 are all the same, what is it? α α 0! 1 1 n π n P = (π1 , π2 , · · · ) .. = πi α λ1 . α But π λ11 P = π. So, π and 0! 1 1 n P = π = πi α λn1 1 α = / π. πi This theorem applies to Markov chains but with some conditions. First, I stated without proof the fact: Theorem 1.10. The maximal eigenvalue of P is 1. More precisely, all eigenvalues of P have |λ| ≤ 1. Proof. Suppose that P has an eigenvalue λ with absolute value greater than 1. Then, there is an eigenvector x so that xP = λx. Then xP n = λn x diverges as n goes to infinity. But this is not possible since the entries of the matrix P n are all between 0 and 1 since P n is the probability transition matrix from X0 to Xn by Theorem 1.6. ! I’ll explain this proof later. What I did explain is class is that 1 is always an eigenvalue of P . This follows from the fact that the rows of P add up to 1: ! p(i, j) = 1. j This implies that the column vector with all entries 1 is a right eigenvector of P with eigenvalue 1: 1 1 1 1 P ... = ... 1 For example, if then 2 1 3 2/3 1/3 P = 1/4 3/4 2 3 2 32 3 2 3 1 1 1 2/3 1/3 P = = 1/4 3/4 1 1 1 MATH 56A SPRING 2008 STOCHASTIC PROCESSES 35 The invariant distribution π is a left eigenvector with eigenvalue 1. The unique invariant distribution is: 2 3 3 4 π= , 7 7 This means: 2 32 3 2 3 3 4 3 4 2/3 1/3 πP = , = , = π. 1/4 3/4 7 7 7 7 You can find π using linear algebra or a computer. You can also use intuition. In the Markov chain: 1/3 1 • 2 1/4 • we can use the Law of large numbers which says that, if there are a large number of people moving randomly, then the proportion who move will be approximately equal to the probability. So, if there are a large number of people in states 1 and 2 then one third of those at 1 will move to 2 and one fourth of those in 2 will move to 1. If you want the distribution to be stable, the numbers should have a 3:4 ratio. If there are 3 guys at point 1 and 1/3 of them move, then one guy moves to 2. If there are 4 guys at 2 and 1/4 of them move then one guy moves from 2 to 1 and the distribution is the same. To make it a probability distribution, the vector (3, 4) needs to be divided by 7. Theorem 1.11. If the Markov chain is aperiodic and irreducible then it satisfies the conclusions of the Perron-Frobenius theorem. Proof. These conditions imply that A = P n has all positive entries for some finite n. Then, the Perron-Frobenius eigenvector for A is the invariant distribution for P . ! The Perron-Frobenius theorem tells us that the distribution of Xn will reach an equilibrium (the invariant distribution) for large n (assuming aperiodic). Then next question is: How long does it take? 36 FINITE MARKOV CHAINS 1.4. Transient classes. I asked the question: How long does it take to escape from a transient class? I started with a really simple example: p •1 ! •2 This is a Markov chain with one transient class {1} and one absorbing class {2}. The question is: How long can you stay in the transient class? I was glad to see that students know basic probability: P(Xn = 1 | X0 = 1) = (1 − p)n . What happens when n goes to infinity? lim (1 − p)n = 0 if p > 0. n→∞ Proof. And you guys helped me with this proof: L = lim (1 − p)n n→∞ ln L = lim "#$% n ln (1 − p) = −∞. n→∞ " #$ % →∞ So, L = 0. " <1 #$ <0 % ! So, the probability of remaining indefinitely in state 1 is zero. In other words, you will eventually escape the transient class with probability one (at least in this example). For future reference I recorded this conclusion as follows. Theorem 1.12. If the probability of success is p > 0 and if you try infinitely many times, then you will eventually succeed with probability one. But how long does it take? Let T := smallest n so that Xn = 2. Then P(T = n | X0 = 1) = p(1 − p)n−1 . For example, if n = 3 then, T = 3 which means we have: X0 • 1 X1 1−p −−−→ • 1 X2 1−p −−−→ • 1 X3 p −−−→ P(T = 3 | X0 = 1) = p(1 − p)2 . • 2 MATH 56A SPRING 2008 STOCHASTIC PROCESSES 37 The numbers p(1−p)n add up to 1 and give what is called the geometric distribution on the nonnegative integers. From this formula we can calculate the conditional expected value of T : ∞ ! E(T | X0 = 1) = np(1 − p)n−1 n=0 and, yes, the n = 0 term is zero. This is easy to calculate using a little calculus. First we start with the geometric series ∞ ! 1 . g(x) := xn = 1 + x + x2 + x3 + · · · = 1−x n=0 Then differentiate: g % (x) = ∞ ! nxn−1 = n=0 1 . (1 − x)2 Applying this formula to the expected value problem, x = 1 − p and we get: 2 3 ! 1 1 1 n−1 E(T | X0 = 1) = p n(1 − p) =p =p 2 = . 2 (1 − (1 − p)) p p This was actually intuitively obvious from the beginning. For example: • 1/5 ! • When p = 1/5 then you expect it to happen in 5 trials. So, E(T ) = 5 = 1/p.