* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Lecture Notes - Kerala School of Mathematics
Survey
Document related concepts
Transcript
Markov Chains INDER K. RANA Department of Mathematics Indian Institute of Technology Bombay Powai, Mumbai 400076, India email: [email protected] Abstract. These notes were originally prepared for a College Teacher’s Refresher course at University of Mumbai. The current revised version is for the participants of the Summer school on ”Probability Theory” at Kerala School of Mathematics,2010. Contents Prologue. §0.1. §0.2. Basic Probability Theory 1 Probability space 1 Conditional probability 1 Chapter 1. Basics 3 §1.1. Introduction 3 §1.3. §1.5. §1.2. Random walks 7 Queuing chains 9 §1.4. Ehrenfest chain 10 Some consequences of the markov property 11 Review Exercises Chapter 2. §2.1. §2.2. Calculation of higher order probabilities 12 15 Distribution of Xn and other joint distributions 15 Kolmogorov-Chapman equation 20 Exercises Chapter 3. 21 Classification of states 23 §3.1. Closed subsets and irreducible subsets Exercises 26 §3.2. Periodic and aperiodic chains 27 Exercises 28 §3.3. Visiting a state: transient and recurrent states 29 Absorption probability 36 §3.5. More on recurrence/transience 41 §3.4. Chapter 4. Stationary distribution for a markov chain 23 45 §4.1. Introduction 45 §4.2. Stopping times and strong markov property 46 §4.3. Existence and uniqueness: 48 §4.4. Asymptotic behavior 53 vii viii Contents Diagonalization of matrices 55 References 57 Index 59 Prologue Basic Probability Theory 0.1. Probability space A mathematical model for analyzing statistical experiments is given by a probability space. A probability space is a triple (Ω, S, P ) where: • Ω is a set representing the set of all possible outcomes of the experiment. • S is a σ-algebra of subsets of Ω. Subsets of Ω are called events of the experiment. Elements of S represents the collection of events of interest in that experiment. • For every E ∈ S, the nonnegative number P (E) is the probability that the event E occurs. The map E 7→ P (E), called a probability, is P : S → [0, 1], with the following properties: (i) P (∅) = 0, and P (Ω) = 1. (ii) P is countably additive,i.e., for countable sequence A1 , A2 , . . . , An , . . . in S, which is pairwise disjoint: Ai ∩ Aj = ∅, P (∪∞ n=1 (Ai )) = ∞ X P (Ai ). i=1 0.2. Conditional probability Let (Ω, S, P ) be a probability space. If B is an event with P (B) > 0, then for every A ∈ S, the conditional probability of A given B, denoted by P (A|B), is defined by P (A|B) = P (A ∩ B) . P (B) Intuitively,PB (A) := P (A|B) is as how likely is the event A to occur, given the knowledge that B has occurred. Some properties of conditional probability are: (i) For countable sequence A1 , A2 , . . . , An , . . . in S, which is pairwise disjoint PB (∪∞ n=1 (Ai )) = ∞ X PB (Ai ). i=1 1 Prologue 2 (ii) Chain rule P (A ∩ B) = P (A|B) P (B). In general, for A1 , A2 , . . . , An ∈ S, P (A1 ∩ A2 ∩ . . . ∩ An ) = P (A1 |A2 ∩ A2 ∩ . . . ∩ An ) P (A2 |A3 ∩ A2 ∩ . . . ∩ An ) . . . P (An−1 |An ), and for B ∈ S, P (A1 ∩ A2 ∩ . . . ∩ An |B) = P (A1 |A2 . . . An ∩ B) P (A2 |A3 ∩ A2 . . . An ∩ B) . . . P (An |B). (iii) Bay’s formula If A1 , A2 , . . . , An , . . . in S, are pairwise disjoint and Ω = ∪∞ n=1 Ai , then for B ∈ S, P (B|Ai ) P (Ai |B) = P∞ P (B|Aj )P (Aj ) j=1 (iv) Conditional impendence Let A1 , A2 , . . . , An , . . . in S, be pairwise disjoint such that P (A|Ai ) = P (A|Aj ) := p for every 1, j then P (A| ∪∞ i=1 Ai ) = p. (v) If A1 , A2 , . . . , An , . . . in S, are pairwise disjoint and Ω = ∪∞ n=1 Ai , then for B, C ∈ S, ∞ X P (C|B) = P (Ai |B) P (C|Ai ∩ D) i=1 Chapter 1 Basics 1.1. Introduction The aim of our lectures is to analyze the following situation: Consider an experiment/system under observation and let s1 , s2 , ..., sn , ... be the possible states in which the system can be. Let us suppose that the system is being observed at every unit of time: n = 0, 1, 2, . . . . Let Xn denote the observation at time n ≥ 0. Thus each Xn can take either of the values s1 , s2 , ..., sn , .... We further assume that the observations Xn′ s are not ’deterministic’, i.e., Xn can take value si with some probability. In other words, each Xn is a random variables on some probability space (Ω, A, P ). In case,the observations X0 , X1 , ... are independent, we know how to compute the probabilities of various events. The situation we are going to look at is slightly more general. Let us look at some examples. 1.1.1 Example: Consider observing the working of a particular machine in a factory. On any day, either the machine will be broken or it will be working. So our system can be in any one of the two states: ’broken’ - represented by 0 , or ’working’- represented by 1. Let Xn be the observation about the machine on nth day. Clearly, there is no reason to assume that Xn will be independent of Xn−1 , ..., X0 . 1.1.2 Example: Consider a gambler making bets in a gambling house. He starts with some amount say A rupees and makes a series of one rupee bets against the house. Let Xn , n ≥ 0 denote the gambler’s capital at time n, say after n bets. Then, the states of the system, the possible values each Xn can take, are 0, 1, 2, ... Clearly, the values of Xn depends upon the values of Xn−1 . 1.1.3 Example: Consider a bill collection office where people come to pay their bills. People arrive at the paying counter at various time points and are being served eventually. Let us suppose that we measure time in minutes. Then the number of persons that arrive during one minute are taken as the ones which arrive at that minute and let us say at most one person can be/will be served in a minute. Let ξn denote the number of persons that arrive at the nth minute. Let X0 denote the number of persons that were waiting initially, (i.e., when the office opened) and for n ≥ 1, let Xn denote the number 3 1. Basics 4 of customers at the nth minute. Thus, for all n ≥ 0, Xn+1 = ξn+1 , if Xn = 0, Xn+1 = Xn + ξn+1 − 1, if Xn ≥ 0, because one person will be served in that minute. The states of the system are 0, 1, 2, ..., and clearly Xn+1 depends upon Xn . Thus, we are going to look at a sequence of random variables {Xn }n≥0 defined on a probability space (Ω, A, P ), such that each Xn can take at most countable number of values. As mentioned in the beginning, if Xn′ s are independent, then one knows how to analyze the system. If Xn′ s are not independent, what kind of relation Xn′ s can have? For example, let us consider the system of example 1.1.1: observing the working of a machine on each day. Clearly, the observation that the machine will be ”in order” or ”not in order” on a particular day depends only upon the fact that it was working or was out of order on previous day. Or in example 1.1.2, the example of gambler, his capital on nth day will be depend only upon his capital on the (n−1)th day. This motivates the following assumption about our system. 1.1.4 Definition: Let {Xn }n≥0 be a sequence of random variables taking values in a set S, called state space, which is at most a countable set. We say that has {Xn }n≥0 has the markov property if for every n ≥ 0 and i0 , i1 , . . . in ∈ S, P {Xn+1 = in+1 |X0 = i0 , X1 = i1 , ...Xn = in } = P {Xn+1 = in+1 |Xn = in } for all n ≥ 0. That is, the observation/outcome at the (n + 1)th stage of the experiment depends only on the outcome immediate past. Thus, if n ≥ 0, and i, j ∈ S, then the numbers P (i, j, n) := P {Xn+1 = j|Xn = i} are going to be important for the system. This is the probability that the system will be in state j at stage n + 1 given that it was in state i at stage n. Note that saying that a sequence {Xn }n≥1, has markov property means that given Xn−1 , the random variable Xn is conditionally independent of Xn−2 , . . . , X1 , X0 . It means that the distribution of the sequence to go to next step depends only upon where the system is now and not where it has been in the past. 1.1.5 Definition: Let {Xn }n≥1, be a markov system with state space S. (i) For n ≥ 0, and i, j ∈ S, the number P (i, j, n) is called the one step transition probability for the system at stage n to go from state i to the state j at the next stage. (ii) The system is said to have the stationary property or the homogeneous property if P (i, j, n) is independent of n, i.e., P (i, j, n + 1) = P (i, j, n) for every i, j ∈ S, n ≥ 1. That is the probability that the system will be in state j at stage n + 1 given that it is in state i at stage n is independent of n. Thus, the probability of the system in 1.1. Introduction 5 going from state i to j does not depend upon the time at which this happens. (iii) A markov system {Xn }n≥1 is called a markov chain if it is stationary. 1.1.6 Definition: Given a markov chain {Xn }n≥1 , Π0 (i) := P {X0 = i}, i ∈ S is called the initial distribution vector or the distribution of X0 . 1.1.7 Graphical representation: A pictorial way to represent a markov chain is by its transition graph. It consists of nodes representing the states of the chain and arrows between the nodes representing the transition probabilities. The transition graphs of examples markov chain in example 1.1.1 is as follows: p(0, 0) = p, p(0, 1) = p(1, 0) = q, p(1, 1) = 1 − q. 1 − p, 1.1.8 Theorem: Let {Xn }n≥1 , be the markov chain with state space S, transition probabilities p(i, j), and initial distribution vector Π0 (i). Let P be the matrix P = [pij ]i×j . Then the following hold: (i) 0 ≤ p(i, j), Π0 (i) ≤ 1. X p(i, j) = 1. (ii) For every i, j∈S (iii) For every j, X Π0 (i) = 1. i∈S 1.1.9 Definition: The matrix P = [p(i, j)]i×j is called the transition matrix of the markov chain. It has the property that each entry is a non negative number between 0 and 1, sum of each row and each column is 1. Let us look at some examples. 1.1.10 Example: Consider the example 1.1.1, observing the working of a machine. Here S = {0, 1}. Let P {Xn+1 = 1|Xn = 0} := p(0, 1) = p, Then, P {Xn+1 = 0|Xn = 1} := p(1, 0) = q. P {Xn+1 = 0|Xn = 0} = 1 − p and {Xn+1 = 1|Xn = 1} = 1 − q. Thus, the transition matrix is P = 1−p q p 1−q . Another way of describing a markov chain is given by 1. Basics 6 1.1.11 Theorem: A sequence of random variables {Xn }n≥0 is a markov chain with initial vector Π0 and transition matrix P, if and only if for every n ≥ 1, and i0 , i1 , . . . , in ∈ S, P {X0 = i0 , X1 = i1 , . . . , Xn = in } = Π0 (i)p(i0 , i1 )p(i1 , i2 ) · · · p(in−1 , in ). (1.1) Proof: First suppose that {Xn }n≥0 is a markov chain with initial vector Π0 and transition matrix P. Then using the chain rule for conditional probability, P {X0 = i0 , X1 = i1 , . . . , X = in } = P {X0 = i0 }P {X1 = i1 |X0 = i0 } · · · P {Xn = in |X0 = i0 , · · · , Xn−1 = in−1 } = Π0 (i) p(i0 , i1 ) p(i1 , i2 ) · · · p(in−1 , in ), Conversely, if equation (1.1) holds, then summing both sides over in ∈ S, X P {X0 = i0 , X1 = i1 , . . . , X = in } in ∈S = X in ∈S Π0 (i)p(i0 , i1 )p(i1 , i2 ) · · · p(in−1 , in ). Thus, P {X0 = i0 , X1 = i1 , . . . , Xn−1 = in−1 } X = P {X0 = i0 , X1 = i1 , . . . , X = in } in ∈S = X in ∈S Π0 (i)p(i0 , i1 )p(i1 , i2 ) · · · p(in−1 , in ) = Π0 (i)p(i0 , i1 )p(i1 , i2 ) · · · p(in−2 , in−1 ). Proceeding similarly, we have for every n = 0, 1, . . . , ik ∈ S, P {X0 = i0 , X1 = i1 , . . . , Xk = ik } = Π0 (i)p(i0 , i1 )p(i1 , i2 ) · · · p(ik−1 , ik ). Thus, for k = 0, we have P {X0 = i0 } = Π0 (i) and P {Xn+1 = in+1 | X0 = i0 , . . . , Xn = in } P {X0 = i0 , . . . Xn = in , Xn+1 = in+1 } = P {X0 = i0 , . . . Xn = in , Xn = in } Π0 (i)p(i0 , i1 )p(i1 , i2 ) · · · p(in , in+1 ) = Π0 (i)p(i0 , i1 )p(i1 , i2 ) · · · p(in−1 , in ) = p(in , in+1 ). Hence, {Xn }n≥0 is a markov chain with initial vector Π0 , and transition probabilities p(i, j), i, j ∈ S. 1.2. Random walks 7 1.2. Random walks 1.2.1 Example(Unrestricted random walk on the line): Consider a particle which moves one unit to the left with probability 1 − p or to the right on the line with probability p. This is called unrestricted random walk on the line. Let Xn denote the position of the particle at time n. Then S = {0, ±1, ±2, ...} and the markov chain has the transition graph and the transition matrix: −3 −1 0 ... ... .. . ... ... 0 ... ... P = .. . −3 −2 −1 0 1 2 3 .. . −2 1 2 3 ... ... ... (1 − p) 0 p ... ... 0 (1 − p) 0 p 0 . ... ... 0 (1 − p) 0 p 0 0 (1 − p) p 0 .. . .. . ... .. . ... ... ... ... ... ... . ... ... ... 1.2.2 Random walk on the line with absorbing barriers: We can also consider the random walk on the line in with state space S = {0, 1, 2, 3, ..., r} and the condition that the walk ends if the particle reaches 0 or r. The states 0 and r are called absorbing states for the particle that reaches this state and is absorbed in it. It cannot leave the state. The transition graph and the transition probability matrix for this walk is given by 1. Basics 8 0 P = 1 0 (1 − p) 1 0 2 ... 0 .. . 0 r 1 2 3 ... ... 0 0 ... ... ... 0 p 0 ... ... (1 − p) 0 p 0 ... ... ... ... ... (1 − p) 0 ... ... 0 ... 0 r 0 0 0 ... p 1 A typical illustration of this situation is when two players are gambling with total capital r rupees. The game ends when A looses all the money, i.e., 0 stage or B looses all the money, i.e., stage r for A, and Xn is the capital of A at nth stage. 1.2.3 Random walk on the line with reflecting barriers: Another variation of the previous example is the situation where two friends are gambling with a view to play longer. So they put the condition that every time a player loses his last rupee, the opponent returns it to him. Let Xn denote the capital of a player A at nth stage. If total money both the players have is r + 1 rupees, then the state space for the system is S = {1, 2, 3, . . . , r}. To find the transition matrix, note that in the first row, P (1, 1) = = = p(1, 2) P {A’s capital remains Rs.1 at next stage given that it was 1 at this stage.} P {A has last rupee and loses. It will be returned} = (1 − p). = P {Capital of A becomes 2| it is 1 now} = p(1, j) P {Xn+1 = 1|Xn = 1} = P {A wins} = p. 0 for j ≥ 3. For the ith row, 1 < i < r, and 1 ≤ j ≤ r, p(i, j) = P {Xn+1 = j|Xn = i} = Thus, the transition matrix is given by: p 0 (1 − p) if if if j = i + 1, j = i 1 < i < r, j = i − 1. 1.3. Queuing chains 9 1 1 (1 − p) 2 (1 − p) 0 3 0 i .. ··· . r 0 2 3 ... ... ... p 0 ··· ··· ··· 0 p 0 ··· ··· (1 − p) 0 p 0 ··· ··· ··· (1 − p) 0 p ··· ··· ··· ··· ··· ··· i ... ··· ··· 0 r (1 − p) 0 0 . 0 . 0 ··· p 1.2.4. Birth and death chain Let Xn denote the population of a living system at time n, n ≥ 1. The state space for the system {Xn }n≥1 is {0,1,2,...}. We assume that at any given stage n, if Xn = x, then the population increases to x + 1, by a unit with probability px or decreases to x − 1 with probability qx , or can remain the same with probability rx . Then, p(x, y) = px if qx if rx if 0 y = x + 1, y = x − 1, y = x, otherwise. Clearly, this is a markov chain, called the birth and death chain and is a special case of random walks. 1.3. Queuing chains Consider a counter where customers are being served at every unit of time. Let X0 be the number of customers in the queue to be served when the counter opens and let ξn be the number of customers who arrive at the nth unit of time. Then, Xn+1 the number of customers waiting to be served at the beginning of n + 1th time unit is Xn+1 = ξn if Xn = 0, Xn + ξn−1 if Xn ≥ 1. The state space for the system {Xn }n≥1 is S = {0, 1, 2, ...}. If {ξn }n≥1 are independent random variables taking only nonnegative integral values, then {Xn }n≥1 is a markov chain. In case {ξn }n≥1 is also identically distributed with distribution function f, we 1. Basics 10 can calculate the transition probabilities: for x, y ∈ S, p(x, y) = P {Xn+1 = y|Xn = x} = P {Xn+1 = y = ξn } P {Xn+1 = y = ξn−1 + Xn } = P {ξn = y} P {ξn = y − x + 1} = f (y) f (y − x + 1) if x = 0, if x ≥ 1. if x = 0, if x > 1. if x = 0 if x > 1. 1.4. Ehrenfest chain Consider two isolated containers labeled as body A and body B, containing two different fluids. Let the total number of molecules of the two fluids, distributed in the containers A and B, be d, labeled as {1, 2, ..., d}. Let the observation be made on the number of the molecules in A. To start with, A has some number of molecules and B has some number of molecules. In the next stage, a number 1 ≤ r ≤ d is chosen at random and the molecule labeled r is removed from the body in which it was and is placed in the other body. This gives observation at second stage and so on. Clearly, Xn , which denotes the number of molecules that can be in A is {0, 1, 2, ..., d} Thus, the state space is S = {0, 1, 2, ..d}. Let us find the transition probabilities p(i, j) 0 ≤ i, j ≤ d of the system. When i = 0, P (0, j) = P {Xn+1 = j|Xn = 0}, i.e., A had no molecules at Xn . Therefore, clearly j can be only 1 at Xn+1 . Thus, P (0, j) = 0 1 if if j= 6 0, j = 1. If A has to have d molecules, (i.e., all of them) at (n + 1)th stage, then, at nth stage, it should have only d − 1 molecules. Thus, B has one molecule and that should be chosen and added to A. This can be done with probability 1. (Because B has only 1 molecule and it is to be selected at random.) Thus, 1 if j = d − 1, P (d, j) = 0 otherwise. For a fixed i, 1 < i < d, let us look at p(i, j), for 0 ≤ j ≤ d. Since p(i, j) is the probability that A will have j molecules, given that it had i molecules. Now if A had i molecules, then the only possibility for j is i − 1 or i + 1, (because the number of molecules in A at any next stage can increase or decrease). Thus, p(i, j) = 0, if j 6= i+1 or i − 1. If j = i + 1, i.e, A has to have i + 1 molecules, then B had d − i molecules and one of the molecules for B should be selected and added to A. The probability for d−i doing this is . Thus, d p(i, i + 1) = d−i i i = 1 − and p(i, i − 1) = . d d d Thus, the transition matrix for this markov chain is given by 1.5. Some consequences of the markov property 0 0 1 2 3 .. . d 0 1 2 1 0 3 ... ... ... (1/d) 0 (1 − 1/d) 0 0 (1/d) 0 (1 − 1/d) 0 ... .. . .. . 0 .. . ... .. . ... 11 d ... 0 ... 0 0 ... 0 1/d 0 (1 − 1/d) 0 .. . ... .. . 0 .. . 1 .. . 0 This model is called Ehrenfest diffusion model. 1.5. Some consequences of the markov property Let {Xn }n≥0 be a markov chain with state space S and transition probabilities (p(i, j)), i, j ∈ S. 1.5.1 Proposition: Let S1 , S2 , . . . S0 be subsets of S. Then for any n ≥ 1, P {Xn = j | Xn−1 = i, Xn−2 ∈ S2 , . . . , X0 ∈ S0 } = p(i, j). Proof: The required property holds for elementary sets Sk = ik , for ik ∈ S by the markov property: P {Xn = j|Xn−1 = i, Xn−2 = in−2 , ..., X0 = i0 } = P {Xn = j|Xn−1 = i}. Since any subset A of S is a countable disjoint union of elementary sets and the required property follows from the property (iv) of conditional probability as in prologue. 1.5.2 Example: let us compute P {X3 = j|X1 = i, X0 = k}, j, k ∈ S. Using proposition 1.5.1, and markov property, we have P {X3 = j|X1 = i, X0 = k} X P {X3 = j|X2 = r, X1 = i, X0 = k}P {X2 = r|X1 = i, X0 = k}. = r∈S = X r∈S = P {X3 = j|X2 = r, X1 = i}P {X2 = r|X1 = i} P {X3 = j|X1 = i}. In fact above example can be extended to following: 1.5.3 Theorem: For n > ns > ns1 > ... > n1 ≥ 0, P {Xn = j|Xns = i, Xns−1 = is−1 ..., Xn1 = i1 } = P {Xn = j|Xns = i} Thus, for a markov chain, probability at n given past at ns > ns−1 > ... > n1 , it depends only on the most recent past, i.e., ns . 1. Basics 12 Thus, to every markov chain, we can associate a vector, distribution of the initial stage and a stochastic matrix whose entries give us the probabilities of moving from a state to another at the next stage. Here is the converse: 1.5.4 Theorem: Given a stochastic matrix P and probability vector Π0 , there exists a markov chain {Xn }n≥1 with Π0 , as initial distribution and P as transition probability matrix. The interested reader may refer Theorem 8.1 of Billingsel[4] 1.5.4 Exercise Show that P {X0 = i0 |X1 = i − 1, ..., Xn = in } = P {X0 = i0 |X1 = x1 }. Review Exercises (1.1) Mark the following statements as True/False: (i) A Markov system can be in several states at one time. (ii) The (1, 3) entry in the transition matrix is the probability of going from state 1 to state 3 in two steps. (iii) The (6, 5) entry in the transition matrix is the probability of going from state 6 to state 5 in one step. (iv) The entries in each row of the transition matrix add to zero. (v) Let {Xn }n≥1 be a sequence of independent identically distributed discrete random variables. Then it is a markov chain. (vi) If the state space is S = {s1 , s2 , . . . , sn }, then its transition matrix will have order n. (1.2) Let {ξn }n≥1 be a sequence of independent identically distributed discrete random variables. Define ξ0 if n = 0, Xn = ξ1 + ξ2 + . . . + ξn for n ≥ 1. Show that {Xn }n≥1 is a markov chain. Sketch its transition graph and compute the transition probabilities. (1.3) Consider a person moving on a 4 × 4 grid. He can move only to the intersection points on the right or down, each with probability 1/2. If he starts his walk from the top left corner and Xn , n ≥ 1 denotes his position after n steps. Show that {Xn }n≥0 } is a markov chain. Sketch its transition graph and compute the transition probability matrix. Also find the initial distribution vector. (1.4) Web surfing: Consider a person surfing the Internet, and each time he encounters a web page, he selects one of its hyperlinks at random (but uniformly). Let Xn denote the page where the person is after n selections (clicks). What do you think is the state space? Find the transition probability matrix. (1.5) Let {Xn }n≥0 be a markov chain with state space, initial probability distribution and transition matrix given by 1/3 1/3 1/3 S = {1, 2, 3}, Π0 = (1/3/1/3, 1/3), P = 1/3 1/3 1/3 . 1/3 1/3 1/3 Define 0 ifXn = 1, 1 otherwise. Show that {Yn }n≥0 is not a markov chain. Thus, function of a markov chain need not be a markov chain. Yn = Review Exercises 13 (1.6) Let {Xn }n≥0 be a markov chain with transition matrix P. Define Yn = X2n for every n ≥ 0. Show that {Yn }n≥0 is a markov chain with transition matrix P 2 . What happens if Yn is defined as Yn = Xnk for every n ≥ 0? Chapter 2 Calculation of higher order probabilities 2.1. Distribution of Xn and other joint distributions Consider a markov chain {Xn }n≥1 with initial vector Π0 , and transition probabilities matrix P = [p(i, j)], i × j. We want to find the probability that after n steps, the system will be in a given state, say j ∈ S? For a matrix A, its n-fold product with itself will be denoted by An . 2.1.1 Theorem: (i) The joint distribution of X0 , X1 , X2 , . . . , Xn , is given by P {X0 = i0 , X1 = i1 , ..., Xn = in } = p(in−1 , in )p(in−2 , in−1 ) . . . p(i0 , i1 )Π0 (i0 ). (ii) The distribution of Xn , P {Xn = j}, is given by the j th component of the vector Π0 P n . (iii) For every n, m ≥ 0, P {Xn = j | X0 = i} = P {Xn+m = j | Xm = i} = pn (i, j), where pn (i, j) is the ij th term of the matrix P n . Proof: (i) Using the chain rule for conditional probability, P {X0 = i0 , X1 = i1 , ..., Xn = in } = P {Xn = in |Xn−1 = in−1 }P {Xn−1 = in−1 |Xn−2 = in−2 , . . . , X0 = i0 } . . . P {X1 = i1 |X0 = i0 }P {X0 = i0 } = P {Xn = in |Xn−1 = in−1 } P {Xn−1 = in−1 |Xn−2 = in−2 }, . . . , = p(in−1 , in )p(in−2 , in−1 ) . . . p(i0 , i1 )Π0 (i0 ). . . . P {X1 = i1 |X0 = i0 }P {X0 = i0 } 15 2. Calculation of higher order probabilities 16 (ii) Let Y be a random variable with values in S and distribution P {Y = i} = λi , i ∈ S. Then using the chain rule for conditional probability, X P {Xn = j} = P {Y = i0 , Xn = j} i0 ∈S = X X i0 ∈S i1 ∈S = X X i0 ∈S i1 ∈S = X X i0 ∈S i1 ∈S ··· ··· ··· X in−1 ∈S P {Y = i0 , Xi1 = i1 , . . . , Xin−1 = in−1 , Xn = j} X P {Y = i0 } P {Xi1 = i1 |Xi1 −1 = i1 − 1} . . . X λi p(i0 , i1 ) · · · p(in−1 .j) in−1 ∈S in−1 ∈S . . . , P {Xn = j|Xin−1 = in−1 } (2.1) Thus for Y = X0 , we have X X X ··· P {Xn = j} = in−1 ∈S i0 ∈S i1 ∈S = (2.2) Π0 (i) p(i0 , i1 ) · · · p(in−1 , j) j th element of the vector Π0 P n . (iii) Once again, using the markov property and the chain rule for conditional probability, P {Xn+m = j | Xm = i} P {Xm = i} = P {Xn+m = j, Xm = i} X X X = ··· P {Xm = im , Xm+1 = im+1 , im ∈S im+1 ∈S im+n−1 ∈S . . . , Xin+m−1 = in+m−1 , Xn+m = j} = X X X X im ∈S im+1 ∈S = im ∈S im+1 ∈S Thus ··· ··· X im+n−1 ∈S X P {Xm = i} P {Xm+1 = im+1 |Xm = i} . . . . . . , P {Xin+m−1 = in+m−1 |, Xn+m = j} im+n−1 ∈S P {Xm = i} p(i, im+1 ) . . . p(in+m−1 , j). P {Xn = j | X0 = i} = P {Xn+m = j | Xm = i} = pn (i, j), where p (i, j) is the ij th term of the matrix P n . n 2.1.2 Definition: Let {Xn }n≥1 be a markov chain with initial vector Π0 , and transition probabilities matrix P = (p(i, j)), i, j ∈ S. (i) For n ≥ 1, and j ∈ S, pn (j) = P {Xn = j} is called the distribution of Xn . (ii) For n ≥ 1, pn (i, j) is called the nth stage transition probabilities. Above theorem gives us the probability of the system in a state at the nth stage and the probability of the event that the system will move in n stages from a state i to a state j. And these can be computed if we know the initial distribution and powers of the transition matrix. Thus, it is important to compute the matrix P n , P being the transition matrix. For large n, this is difficult to compute. Let us look at some examples. 2.1.3 Exercise: Show that the joint distribution of Xm , Xm+1 , . . . , Xm+n is given by p(in−1 , in )p(in−2 , in−1 ) · · · p(im+1 , im+2 )P {Xm+1 = im+1 } 2.1. Distribution of Xn and other joint distributions 17 Also write the joint distribution of any finite Xn1 , Xn2 , ..., Xnr . for n1 < n2 , ... < nr . 2.1.4 Example: Consider a markov chain {Xn }n≥1 with the special situation where all the Xn′ s are independent. Let us compute P n , where P is the transition probability matrix. Because Xn′ s are independent, p(i, j) = P {Xn+1 = j|Xn = i} = P {Xn+1 = j} for all j, i and for all n. Thus, each row of P is identical. By theorem 2.1.1(iii), for all i, pn (i, j) = = = P {Xn+m = j|Xm = i} P {Xn = j|X0 = i} P {Xn = j} = p(i, j). Therefore each P n (i, j) = p(i, j), i.e., P n = P. 2.1.5 Example: Let us consider the markov chain with two states S = {0, 1} and transition matrix P = 1−p q p 1−q . Let Π0 (0), Π0 (1) be initial distributions. The knowledge of P and Π0 (0), Π0 (1) helps us to answer various questions. For example, to compute the distribution of Xn , using the formula of conditional probability: P (A|B) P (B) = P (A ∩ B)), we have for every n ≥ 0, P {Xn+1 = 0} = = = = = P {Xn+1 = 0, Xn = 0} + P {Xn+1 = 0, Xn = 1} P {Xn+1 = 0|Xn = 0} P {Xn = 0} +P {Xn+1 = 0|Xn = 1} P {Xn = 1} (1 − p)P {Xn = 0} + qP {Xn = 1} (1 − p)P {Xn = 0} + q(1 − P {Xn = 0}) (1 − p − q)P {Xn = 0} + q. Thus, for n = 0, 1, 2, ..., P {X1 = 0} P {X2 = 0} = = = (1 − p − q)Π0 (0) + q (1 − p − q)P {X1 = 0} + q (1 − p − q)[q + (1 − p − q)Π0 (0)] + q = (1 − p − q)2 Π0 (0) + q(1 − p − q) + q. = (1 − p − q)2 Π0 (0) + q ··· ··· ··· ··· ··· P {Xn = 0} = (1 − p − q)n + q n−1 X j=0 1 X j=0 (1 − p − q)j . (1 − p − q)j . 2. Calculation of higher order probabilities 18 P n (0, 0) P {Xn = 0|X0 = 0} = P {Xn = 0} q q + (1 − p − q)n 1 − p+q p+q p q + (1 − p − q)n . p+q p+q = = = Then, using the fact that P {X0 = 0} = 1, P n (0, 1) P {Xn = 1|X0 = 0} = P {Xn = 1} p p + (1 − p − q)n 0 − p+q p+q p p − (1 − p − q)n . p+q p+q = = = Then, P n (1, 0) = = = = = P {Xn = 0|X0 = 1} P {Xn = 0} q q + (1 − p − q)n Π0 (0) − p+q p+q q q + (1 − p − q)n 0 − p+q p+q q q n − (1 − p − q) . p+q p+q And P n (1, 1) = Therefore, Pn = 1 p+q 1 p+q + (1 − p − q)n q q p p + (1 − p − q)n p+q q p+q p −q −p q . 2.1.6 Exercise: Consider the (random walk) markov chain as in example 1.1.10. (i) If p = q = 0, what can be said about the machine? (ii) If p, q > 0, show that P {Xn = 0} = q q + (1 − p − q)n Π0 (0) − p+q p+q P {Xn = 1} = p p + (1 − p − q)n Π0 (1) − . p+q p+q and (iv) Find conditions on Π0 (0) and Π0 (1) such that distribution of Xn is independent of n. (v) Compute the following: P {X0 = 0, X1 = 1, X2 = 0} (vi) Can one compute joint distribution of Xn+2 , Xn+1 , Xn ? 2.1. Distribution of Xn and other joint distributions 19 2.1.7 Note (In case P is diagonalizable: As we observed earlier, it is not easy to compute P n for a matrix P, even when it is finite. However, in the case P is diagonalized (see Appendix for more details), it is easy: let there exist an invertible matrix U such that U P U −1 = D, where D is a diagonal matrix. Then P n = U Dn U −1 , and Dn is easy to compute.In this case, we can compute the elements of P n . Let the state space has M elements and P be diagonalizable with diagonal elements of D be λ1 , λ2 , . . . , λM , these are the eigenvalues of P. To find pn (i, j) : (i) Compute the eigenvalues λ1 , λ2 , . . . , λM , of P by solving the characteristic equation. (ii) If all the eigenvalues are distinct, then for all n, pn ij has the form n n pn ij = ai λ1 + . . . + aM λM , for some constants ai , . . . , aM , depending upon i and j. These can be found by solving system of linear equations. 2.1.8 Example: Let for a markov chain, the transition matrix is 0 1 0 1/2 1/2 , P = 0 1/2 0 1/2 and let us try to find a general formula for pn 11 . We first compute the eigenvalues of P by solving 0−λ det(P − λI) = 0 1/2 1 1/2 − λ 0 0 1/2 1/2 − λ = 0. This gives (complex) eigenvalues 1, ±(i/2). Thus, for some invertible matrix U, 0 i/2 0 1 =U 0 0 0 (i/2)n 0 1 P =U 0 0 and hence Pn 0 0 U −1 , −i/2 0 U −1 . 0 (−i/2)n In fact U can be explicitly written in terms of the eigenvectors. In another way, above equation implies that for scalars a, b, c, n n pn 11 = a + b(i/2) + c(−i/2) . In order to have real solutions, we compare the real and imaginary parts of the above and have for all n ≥ 0, n n pn 11 = a + b(i/2) cos(nπ/2) + c(i/2) sin(npi/2). In particular for n = 0, 1, 2, we have 1 = p0 11 = a + b 0 = p1 11 = a + 1/2c 0 = p2 11 = a − 1/4b. 2. Calculation of higher order probabilities 20 A solution of the above system is given by a = 1/5, b = 4/5, c = −2/5, and hence n n pn 11 = 1/5 + (1/2) (4/5 cos(nπ/2) − 2/5(i/2) sin(nπ/2). 2.2. Kolmogorov-Chapman equation We saw that given a markov chain {Xn }n≥1 with state space S, initial distribution Π0 and transition matrix P, we can calculate the distribution of Xn and other joint distributions. Thus, if we write Πn for the distribution of Xn , i.e., if Πn (j) = P {Xn = j}, then, X Π0 (k)pn Πn (j) = kj . k∈s or symbolically, Πn = Π0 P n . Now we can write the joint distribution of Xn+1 , ...Xm+n as P {Xm+t = it , 0 ≤ t ≤ n} = Πm+1 (i1 )pi1 , i2 , ...pin+1 , in . Entries of P n are called the nth step transition probabilities. Thus, the knowledge about the markov chain is contained in Π0 and the matrix P. As noted earlier P is a matrix (may be an infinite) such that sum of each row is 1, i.e., a stochastic matrix. For consistency, we define P 0 = Id. The following is easy to show: 2.2.1 Theorem: For n, m ≥ 0 and (i, j ∈ S, pn+m (i, j) = X pn (i, r) pm (r, j), r∈S In matrix multiplication this is just P n+m = P n P m . This is called the Kolmogorov Chapman equation. Proof: Using the property (v) conditional probability pn+m (i, j) = = P {Xn+m = j | X0 = i} X P {Xn = r, | X0 = i} P {Xn+m=j | Xn = r, X0 = i} r∈S = X r∈S = X pn (i, r) P {Xn+m = j | Xn = r, X0 = i} pn (i, r)pm (r, j), r∈S The last equality follows from the fact that P {Xn+m = j | Xn = r, X0 = i} = P {Xn+m = j | Xn = r} = pm (r, j), as observed in theorem 1.5.5. 2.2.2 Example: Consider the unrestricted random walk on the line, as in example 1.2.1, with probability p to move forward and 1 − p to come back. Then, p2n+1 (0, 0) = 0. Exercises 21 as only in even steps it can come back to starting point. And, 2n pn (1 − p)2n−n , p2n (0, 0) = n as there will be n moves to right and n back. Thus, 2n (pq)n . p2n (0, 0) = n In fact,this is true for every diagonal entry. Other entries are difficult to compute. Note that ∞ X 2n ∞ (pq)n . pin = Σ 00 n=0 n n=1 Using sterling’s approximation, n! we have √ 2πnn+1/2 e−n , (pq)n 2n 2n P00 = Σ∞ n=0 √ nπ which is convergent if pq < 1, and divergent otherwise. Thus, 0 is transient if p 6= q, and recurrent if p = q = 1/2. 2.2.3 Example: Consider the markov chain of exercise 1.3, with state space S = {1, 2, 3, 4}, initial distribution (1, 0, 0, 0), and transition matrix 0 1/2 0 1/2 1/2 0 1/2 0 . P = 0 1/2 0 1/2 1/2 0 1/2 0 Then, 1/2 0 2 P = 1/2 0 and Π0 P 2 = 1 0 0 1/2 0 0 1/2 0 0 1/2 0 1/2 0 1/2 0 1/2 1/2 0 1/2 0 1/2 0 1/2 0 0 1/2 0 1/2 0 1/2 = 0 1/2 0 1/2 0 1/2 . Thus, if we want to find the probability that the walker will be in state 3 in two steps, then it is Π2 (3) = (Π0 P 2 )(3) = 0. Exercises (2.1) Consider the markov chain of example 2.2.3. Show that (0, 1/2, 0, 1/2) for n=1, 3, 5, . . . Πn = (1/2,0,1/2,0) for n= 2, 4, 6, . . . 2. Calculation of higher order probabilities 22 (2.2) Let {Xn }n≥0 be a markov chain with state space, initial probability distribution and transition matrix given by 3/4 3/4 . S = {1, 2}, Π0 = (1, 0), P = 1/4 1/4 Show that Πn = 1 1 (1 + 2−n ), (1 + 2−n ) 2 2 for every n. (2.3) Consider the two state markov chain {Xn }n≥0 with Π0 = (1, 0), and transition matrix 1−p p . P = q 1−q Using the the facts that P is stochastic and the relation P n+1 = P n P, deuce that p( n + 1)(1, 1) = P n (1, 1) + pn (1, 2) = pn (1, 2)q + pn (1, 1)(1 − p) 1, and hence,for all n > 0, p( n + 1)(1, 1) = pn (1, 1)(1 − p − q) + q. Show that this has a unique solution p q n p + q + p + q (1 − p − q) n p (1, 1) = 1 for p + q > 0 for p + q < 0. Chapter 3 Classification of states Let {Xn }n≥0 be a Markov chain with state space S, initial distribution Π0 and transition probability matrix P. We will denote the ij th element of pn (i, j) also by pn ij . We start looking at the possibility of moving from one state to another. 3.1. Closed subsets and irreducible subsets 3.1.1 Definition (i) We say a state j is reachable from a state i (or i leads to j or j is approachable from i,) if there exists some n ≥ 0, such that pn ij > 0. We denote this by i → j. In other words, i leads to j in n steps with positive probability. (ii) A subset C of the state space is said to be closed if no state from C leads to a state outside C. Thus C is closed is same as for every i ∈ C, j ∈ / C pn ij = 0 ∀n ≥ 0. This means once the chain enters the set C it will never leave it. (iii) A state j is called an absorbing state if the singleton set {j} is a closed set. 3.1.2 Proposition: (i) If i → j and j → k, then i → k. (ii) A state j is reachable from a state i iff pii1 pi1 i2 . . . pin−1 j > 0, for some i1 , i2 , . . . , in−1 ∈ S. (iii) C ⊂ S is closed iff ∀i ∈ C, j ∈ / C, pij = 0. (iv) The state space S is closed and for i ∈ S, the set {i} is closed if pii = 1. Proof: (i) Follows from the fact that X n m m pir prk > pn pn+m = 1j pjk > 0 for some n, m > 0. ik r∈S (ii) Follows from the equality pn ij = X pii1 pi1 i2 . . . pin−1 j . i1 ,...in−1 23 3. Classification of states 24 (iii) Clearly, pn / C, pij = ij = 0 ∀ n implies that pij = 0. Conversely, let for all i ∈ C, j ∈ 0. Then plk = 0 for l ∈ C, k ∈ / C, and pk,l = 0 for l ∈ / C, r ∈ C. Thus, for all r ∈ C and k ∈ / C, X X prl plk = 0. prl plk = p2rk = l∈S l∈S / Proceeding similarly, pn rk = 0 for all n ≥ 1. (iv) Proof of (iv) is obvious. 3.1.3 Definition: A subset C of S is called irreducible if any two states in C lead to one another. Let us look at some examples. 3.1.4 Example: Consider a markov chain with transition matrix: 0 1 2 3 4 5 0 1 2 3 4 5 1 0 0 0 0 0 1/4 1/2 1/4 0 0 0 0 1/5 2/5 1/5 0 1/5 0 0 0 1/6 1/3 1/2 0 0 0 1/2 0 1/2 0 0 0 1/4 0 3/4 . We first look at which state leads to which state. Whenever, i → j, we put a ∗ in the matrix entry. Note, pij > 0 will give a ∗ at ij th entry, but pij = 0 need not give 0 in the matrix. For example, p13 = 0, but 1 → 2 → 3, so p13 is replaced by ∗. For the above matrix, we have 0 1 2 3 4 5 0 ∗ ∗ ∗ 0 0 0 1 0 ∗ ∗ 0 0 0 2 0 ∗ ∗ 0 0 0 3 0 ∗ ∗ ∗ ∗ ∗ 4 0 ∗ ∗ ∗ ∗ ∗ 5 0 ∗ ∗ ∗ ∗ ∗ 1→ 2→ 3→ 4→ 5→ 2→ 1→ 4, 3→ 3→ 3→ 0, 3→ 4, 4, 4 → 2→ 5, 4→ 5→ 5 3→ 3→ 5 5 4→ 3 5 Clearly, every single state i is a closed set if pii = 1. For example in our case, {0} is a closed . The set S is closed by definition for there is no state outside S. Thus, {0, 1, 2, 3, 4, 5} is closed. A look at the matrix of communication tells us that the set {3, 4, 5} is closed because none of 3, 4, 5, lead to 0, 1, 2. For example {1} is not closed because 1 → 2. In fact, there is no other closed sets. The set {3, 4, 5} is also irreducible. 3.1. Closed subsets and irreducible subsets 25 3.1.5 Note (importance of closed irreducible sets): Why one should bother about closed subsets of the state space? To find the answer, let us look at the above example again. Let us take a proper closed set, say C = {3, 4, 5}. Now if we remove the rows and columns corresponding to states 1 and 2 from the transition matrix, we get the sub-matrix 3 4 5 3 4 1/6 1/2 1/4 1/3 0 0 5 1/2 1/2 3/4 which has the property that sum of each row is 1. In fact, if we take P 2 and delete rows and columns not in C, and write it as (P 2 )C , then it is easy to check it is nothing by (PC )2 . For in P 2 note for i ∈ C, Pij2 = 0 if j ∈ / C. Therefore, 1= X Pij2 = X p2ij . j∈C j∈S Thus, (PC )2 is a stochastic matrix. Also, for i, j ∈ C., (ij)th entry of PC2 . X X 2 pir prj = pir prj = pij = 0 if j ∈ / C. r∈C r∈S because C is closed, and pir = 0, for r ∈ / C. In general, (P n )C = (PC )n . Hence, one can consider the chain with state space C and analyze it. This reduces the number of states. 3.1.6 Definition: Two states i and j are said to communicate if either is accessible from the other, i.e., m pn ij > 0 and pji > 0 for some m, n ≥ 1. In this case we write i ↔ j. 3.1.7 Proposition: (i) For i, j ∈ S, let us say i ∼ j iff i ↔ j. Then ∼ is an equivalence relation on S (ii) Each equivalence class, called communicating class has no proper closed subsets. Proof: (i) That i ↔ i follows from the fact that P 0 = Id, and hence p0ii = 1. Obviously, it is symmetric, and transitivity follows from proposition 3.1.2(i). (ii) Let C be an equivalence class. If A is a proper subset of C, let j ∈ C \ A. Let i ∈ A. Then i ↔ j implying that j ∈ / A is accessible from i A. Hence, A is not closed. 3.1.8 Note: A communicating class need not be closed. It may be possible to start from one communicating class and enter another with positive probability. For example consider a markov chain with transition matrix P = 1/2 0 1/3 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1/3 1/2 0 0 0 0 1/3 1/2 0 1 0 0 0 0 1 0 . 3. Classification of states 26 The communicating classes are {1, 2, 3}, {4}, {5, 6}. Clearly, 3 → 4, but 4 6→ 3. Only {5, 6} is a closed subset. 3.1.9 Example: Consider a markov chain with five states {1, 2, 3, 4, 5} and with transition matrix .. . 0 0 0 1/2 1/2 .. 1/4 1/4 . 0 0 0 ··· ··· ··· ··· ··· ··· P = .. 0 0 . 0 1 0 .. 0 . 1/2 0 1/2 0 .. 0 0 . 0 1 0 States 1 and 2 communicate with each other and with no other state. Similarly, states 3, 4, 5 communicate among themselves only. Thus, the state space divides into two closed irreducible sets {1, 2} and {3, 4, 5}. For the sake of all practical purposes, analyzing the given markov chain is same as analyzing two smaller two chains with smaller state space, with transposition matrices P1 = 1/2 1/4 1/2 1/4 0 , P2 = 1/2 0 1 0 1 0 1/2 . 0 3.1.10 Theorem: A set C ⊆ S is irreducible if every state in C communicates with every other state in it. Proof: Suppose, C is irreducible. For j ∈ C, define Cj = {i ∈ C | pn ij = 0∀n ≥ 0}. We claim that Cj is a closed set. To see this, let k ∈ / Cj . Then there exists some m such that pm kj > 0. Now if i is such that pik > 0, then X m pil pm pm+1 = lj > pik pkj > 0, ij l∈S not possible if i ∈ Cj . Thus, pik = 0, for every i ∈ Cj and k ∈ / Cj , implying that cj is closed. In fact, C being irreducible, this implies that C = Cj , and hence any two states in C communicate with each other. Conversely, let i ↔ j for all i, j ∈ C and A ⊆ C be a closed set. Then, for j ∈ A and i ∈ C, since i ↔ j, we have jinC, and hence A = C, i.e., C is irreducible. In view of note 3.1.5, one would like to partition the state space into irreducible subsets. Exercises (3.1) Let the transition matrix 1/2 1/2 0 1 0 of a markov chain be given by 0 0 1/2 0 0 1/3 0 1/6 0 1 0 0 . 0 0 0 0 1 0 0 0 3.2. Periodic and aperiodic chains 27 Write the transition graph and find all the disjoint closed subsets of the state space S = {1, 2, 3, 4, 5}. (3.2) Consider the markov chain in example 1.2.2, Random walk with absorbing barriers. Show that the state space splits into three irreducible sets. Is it possible to go from one set to other? (3.3) For the queuing markov chain in example in section 1.3, write the transition matrix and if f (k) > 0 for every k, deuce that S itself is irreducible. (3.4) Let a markov chain have transition matrix 0 1 0 P = 0 0 1 . 1 0 0 Show that it is an irreducible chain. 3.2. Periodic and aperiodic chains Throughout this section {Xn }n≥0 will be a markov chain with state space S, initial probability Π0 and transition matrix P. 3.2.1 Definition: A state j is said to have period d, if pn jj > 0 implies d divides n and d is the largest such integer. In other words, ’period of j is the greatest common divisor of the numbers {n ≥ 1|pn ij > 0}. A state j has period d, means that pn jj = 0 unless n = md for some m ≥ 1, and d is the greatest positive integer with this property. Thus, j has period d means the chain may come back to j at time points md only. But, it may never come back to the state j. 3.2.2 Example: Consider a markov chain with transition matrix 1 2 P = 3 4 1 0 1/2 0 0 2 1 0 0 0 3 0 4 0 1/2 0 . 0 1 1 0 Now pjj = 0 ∀ j. Therefore, period of each state is > 1. In fact, each state has period (odd) 2 for p2jj > 0 and pjj = 0. But {3, 4} form a closed set and once a particle goes to the set {3, 4} (say from state 2,) it will never come out and return to 2. 3.2.3 Definition: A state j is called aperiodic state if j has period 1. The chain is called aperiodic chain if every state in the chain has period 1. In an aperiodic chain, if i ↔ j, then pn ij > 0 for all sufficiently large n, i.e., it is possible for the chain to come back to any state at any time. 3. Classification of states 28 3.2.4 Example: Consider the transition graph of a markov chain with transition graph Note that the starting in state 1, it can be revisited at stages 4, 6, 10, 8, ... Thus the state 1 has period 2. 3.2.5 Example (Birth and death chain): Consider a markov chain on S = {0, 1, 2, . . .}. Starting at i the chain can stay at i or move to i − 1 or i + 1 with probabilities qi ri p(i, j) = pi 0 if j = i − 1 j = x, j = i + 1, otherwise. Saying that that it is an irreducible chain is same as saying that pi > 0 for all i ≥ 0, and qi > 0 for all i > 0. It will be aperiodic if ri > 0, see exercise (3.5) below. If ri = 0 for all i, then the chain can return to i only after even number of steps. Thus the period of the chain can only be a multiple of 2. Since p200 = p0 q1 > 0, every state has period 2. 3.2.6 Theorem: If two states communicate with each other, then they have same periods. Proof: Let di = period of i and dj = period j. It is enough to show that di divides r if prjj > 0. n i ↔ j implies there exist n, m such that pm ij > 0 and pji > 0. By Kolmogorov-Chapman equations, for every r ≥ 0, m+r+n r n pii ≥ pm ij pjj pji > 0. This implies di divides m+r+n for every r ≥ 0, with prjj > 0, In particular, with r = 0, as p0jj > 1 implies that di divides m + n, and hence di divides r = (m + r + n) − (m + n). Hence, di ≥ dj . Similarly,di ≤ dj . Exercises (3.5) Show that if a markov chain is irreducible and pii > 0 for some state i, then it is aperiodic. (3.6) Show that the queuing chain of example 1.3 is aperiodic. 3.3. Visiting a state: transient and recurrent states 29 3.3. Visiting a state: transient and recurrent states Let i, j ∈ S be fixed. Let us consider the probability of the event that for some n ≥ 1, the system will visit the state j given that it starts in the state i. Let n fij := P {Xn = j, Xk 6= j, 1 ≤ k ≤ n − 1|X0 = i} , n ≥ 1, n i.e., fij is the probability of first visit to state j starting at i in n steps. We are interested in computing fij := ∞ X n fij , n−1 in terms of the transition probabilities. Let us first compute fiin for any n. We define fii0 = 0 for all i. It is the probability of eventual visit to state j starting from state i. Note that, fii1 = pii and fij is the probability that the system has a visit to j starting at i in some finite time. 3.3.1 Proposition: 1 (i) fij = pij . X n n+1 pir frj . (ii) fij = r6=j n X (iii) pn ij = k pn−k jj fij . k=0 (iv) pn ii = n X pii n−k fii k . k=1 (v) P {system visits state j at least 2 times |X0 = i} = fij fjj . More generally, (m−1) P {system has m visits and at least to state j |X0 = i} = fij fjj . Proof: (i) Obvious. (ii) n+1 fij = X r6=j = X P {from i to r in one step} P {first visit in nth step from r to j} n pir frj . r6=j (iii) Note that pn ij = = n X m=1 n X P {first visit to j at mth step |X0 = i.}P {Xn = j|Xm = j} m n−m fij pjj . m=1 (iv) Follows from (iii) 3. Classification of states 30 (v) P {system visits state j at least2 times |X0 = i} XX P {system has first visit toj at k|X0 = i} × = n k XX = n k n fij fjj = X k fij n k ! X k P {system has first visit at n + k|Xk = j} ! n fjj = fij fjj . In the general case, similarly, (m−1) P {system has m visits and at least to state j |X0 = i} = fij fjj . . 3.3.2 Definition: (i) A state i is called recurrent if fii = 1, i.e., with probability 1, the system comes back to i. (ii) A state i is called transient if fii < 1. Thus, the probability that the system starting at i does not come back to j, i.e., (1 − fii ), is positive. 3.3.3 Theorem: (i) The (a) (b) (c) following statements are equivalent for a state j: The state is transient. P {system visits to j infinite number of times |X0 = i} = 0. X pn jj < ∞. n (ii) The following statements are equivalent for a state j: (a) The state is recurrent. (b) P {system visits to j infinite number of times |X0 = i} = 1. X pn (c) jj = ∞. n Proof: (i) Using (v) of theorem 3.3.1, we have P {system visits to j infinite number of times |X0 = i} = = = limm→∞ P {system has at least m visits to statej|X0 = i}. limm→∞ (fij fj j (m−1) ) fij limm→∞ (fjj )(m−1) . Hence, P {system visits to j infinite number of times |X0 = i} = 0 iff fjj < 1. This shows that (b) holds iff (a) holds. X n pjj < ∞. Then by Borel-Cantelli lemma, (b) Next suppose (c) holds, i.e., holds. n 3.3. Visiting a state: transient and recurrent states 31 Conversely, let (a) holds, i.e., fjj < 1. We shall show (c) holds. Using 3.3.2(ii), we have n X ptjj = t−1 n X X t=1 s=0 t=1 = n−1 X psjj s=0 Thus, (1 − fjj ) (t−s) s pjj fjj Pn t=1 n X (t−s) fjj t=s+1 ≤ fjj + n X psjj fjj . s=1 ptjj ≤ fjj . Thus, for every n ≥ 1 n X t=1 ptjj ≤ fjj , 1 − fjj implying (c) as fjj < 1. This completely proves (i). Proof of (ii) follows from (i). 3.3.4 Example: Consider the unrestricted random walk on the integers with probability p moving to right, probability q moving to left, and p + q = 1. It is clearly an irreducible chain. Starting at 0 one can come back to 0 only in even number of steps. Thus, p00 2n+1 = 0, and p2n 00 = {X2n = 0 |X0 = 0}. Starting from 0 if it has to come back to 0 in 2n steps, then it can go to left in n steps and right by n steps. Thus, 2n pn q n . p2n = 00 n Therefore, ∞ X p2n 00 = ∞ X p2n 00 = n=0 m=0 ∞ X 2n pn q n . n n=0 To decide whether the state 0 is transient or not, one has to know whether this series is convergent or not. Note that, 2n! 2n = , n n!.n! √ and by sterling’s formula, n! ∼ ( 2π)nn+1/2 , we have (2n)2n+1/2 2n ∼ n+1/2 n n .nn+1/2 = 22n .2! n2n+1/2−2n−1 22n √ = √ nπ 2π Hence, (4pq)n . p2n 00 ∼ √ nπ Since p(1 − p) = pq ≤ 1/4 and equality holds iff p = q = 1/2. Thus, for θ = 4pq, ∞ X θn √ , θ < 1 if p 6= q 6= 1/2. n ∞ 0 X 2n p00 ∼ ∞ X n=0 1 √ if p = q = 1/2. n 0 3. Classification of states 32 ∞ X θn √ < +∞ and is divergent if θ = 1. Thus, 0 is a n 0 recurrent state iff p = q = 1/2. In fact same holds ∀ state j. If p 6= q, then intuitively particle will drift to −∞ or +∞. as 0 is the transit state and so in every state. One knows that for θ < 1, 3.3.5 Theorem: Let i → j and i be recurrent. Then, (i) fji = 1, j → i and j is recurrent. (ii) fij = 1 Proof: (i) Since i → j, there exists n ≥ 1 such that pn ij > 0. Let n0 be the smallest positive n0 m 0 integer such that pn ij > 0. Then, pij = 0 for 1 ≤ m < n. Since pij > 0, there exists states i1 , i2 , ...in0 −1 , none equal to j such that P {Xn0 = j, Xn0 −1 = in0 −1 , ...X1 = i1 |X0 = i} > 0. (3.1) Suppose fji < 1. Then (1 − fji ) > 0, i.e., P {system starts at j but never visits i} > 0. (3.2) Therefore, α: = = P {X1 = i1 , ..., Xn0 −1 = in0 −1 , Xn0 = j, Xn 6= i for n > n0 |X0 = i} P {Xn 6= i for n ≥ n0 + 1|Xn0 = j, Xn0 −1 = in0 −1 , . . . ..., X0 = i} × P {Xn0 = j, Xn0 −1 = in0 −1 , . . . = > ..., Xi1 = i1 |X0 = i} P {Xn 6= i for n ≥ n0 + 1|Xn0 = j} × 0, P {Xn0 = j, ..., X1 = i1 |x0 = i}, using equations (3.1) and (3.2). Thus P {Xn 6= i for every n|X0 = i} > α > 0 for all n, i.e., the system starts at i and never comes back to i, i.e., i cannot be a recurrent state. Hence, if i is recurrent then our assumption that fji < 1 is not true. Thus, i recurrent implies fji = 1. But then, fji = X m fji = 1, m≥1 m and hence for some m, fji > 0, i.e., with positive probability there is a first visit to m i starting from j. Hence pm ji ≥ fji > 0, i.e., j → i. Thus, we have shown i → j and i recurrent implies fji = 1. and hence j → i. Further, m+n+n0 pjj = X r,k n n0 m n n0 pm jr prk pkj . > pji pik pij . 3.3. Visiting a state: transient and recurrent states Using this, X n≥1 pn jj ≥ = X 33 pn jj . n=m+1+n0 X m+n+n0 pjj n≥1 > X n n0 pm ji pii pij = pm ji n≥1 because n0 pm ji > 0, pij > 0, and Thus, j is recurrent, proving (i). X X n≥0 0 pn pn ii ij = ∞, pn ii = +∞. (ii) Apply (i) to i and j, interchange. 3.3.6 Corollary: If i → j and j → i, then, either both are transient or both are recurrent. Proof: If i is recurrent, and i → j then, j is recurrent by above theorem. Let i be transient and j be recurrent. But as j → i, and hence by above theorem i is recurrent, not possible. Hence, i transient implies j transient. 3.3.7 Corollary: Let C ⊂ S be an irreducible set. Then, either all states in C are recurrent or all are transient. Further, if C is a communicating class and all its states are recurrent, then C is closed. Proof: Since all states in C communicate with each other,by corollary 3.3.6, all states in C are either transient or recurrent. Next suppose C is a communicating class and j ∈ / C. Let i → j for some i ∈ C. Then by above theorem above, j → i, and hence j ∈ C, not true. Hence C is closed. Hence we know how to characterize irreducible markov chains. 3.3.8 Exercise: Show that if a state j is transient, then ∞ X n=1 pn ij < ∞ for all i. 3.3.9 Theorem: Let {Xn }n≥1 be an irreducible markov chain with state space S and transition probability matrix P. (i) Either all states are transient in which case X n pij < +∞ for all i, j. n≥0 and P {Xn = j infinite n′ s|X0 = i} = 0 (ii) All states are recurrent in which case X n pii = +∞ for all i. n≥0 3. Classification of states 34 3.3.10 Corollary: If S is finite then it has at least one recurrent state. Proof: Suppose all states are transient. Then, X pn ij < +∞ for all i, j. n≥0 Thus, limn→∞ pn ij = 0. Hence, as S is finite and P is a stochastic matrix, 0 = limn→∞ X pn ij = 1 j∈S a contradiction. 3.3.11 Corollary: In a finite irreducible chain, all states are recurrent. 3.3.12 Examples: The two states markov chain with transition matrix 1−p p q 1−q is irreducible, finite and hence all states are recurrent. 3.3.13 Example : Consider the chain discussed in example 3.1.3 with transition matrix 0 1 2 3 4 5 0 1 1/4 0 0 0 0 1 0 1/2 1/5 0 0 0 2 0 1/4 2/5 0 0 0 3 0 0 1/5 1/6 1/2 1/4 4 5 0 0 0 0 0 0 . 1/3 1/2 0 1/2 0 3/4 Let us find its transient, recurrent states. (i) 0 is an absorbing state as p00 = 1 and hence is recurrent. (ii) As observed earlier {3, 4, 5} is a finite ,closed, irreducible set, hence by corollary 3.3.11, all states are recurrent. (iii) Now if 2 was a recurrent state, since 2 → 0, and by theorem 3.3.5, we should have 0 → 2, but that is not true. Hence 2 is not recurrent and hence must be transient. Similarly, 1 is transient. Thus we can write the state space as S = {1, 2} ∪ {3, 4, 5}, where first set consists of transient states and second is irreducible set of recurrent states. 3.3. Visiting a state: transient and recurrent states 35 3.3.14 Example : Let us find transient/recurrent state for chains with transition matrices: 0 P = 0 1/2 1/2 1 2 3 1/2 1/2 Q= 0 1/2 , 1/2 0 R= 1 1/2 1/2 0 0 1/4 2 1/2 1/2 0 0 1/4 3 0 0 1/2 1/2 0 1 0 1 0 0 4 0 0 1/2 1/2 0 2 3 4 0 1/2 1/2 0 0 0 , 1 0 0 1 0 0 5 0 0 0 0 1/2 Chain with transition matrix P is finite irreducible and thus recurrent and finite. The chain with transition matrix Q is also finite irreducible and hence recurrent. For the chain with transition matrix R, {1, 2} and {3, 4} are irreducible sets and hence are recurrent. Since, 5 → 1 but 1 6→ 5 so 5 cannot be recurrent. Therefore, 5 is transient. Once again, we have the decomposition S = {5}∪{1, 2}∪{3, 4}, where first set consists of transient state and second and third sets are irreducible sets of recurrent states. We had saw in above example, that the state space S could be written as ST ∪C, ∪... When ST consists of all transient states, C1 , C2 , ... are closed irreducible sets containing of recurrent states. We show this is possible in general. 3.3.15 Proposition: For every recurrent state i there exists a closed subset C(i) such that the following holds: (i) Each C(i) 6= ∅, is closed and irreducible. (ii) Either C(i1 ) ∩ C(i2 ) = ∅ or C(i1 ) = C(i2 ). (iii) ∪i C(i) = SR , set of all recurrent states Proof: For i ∈ SR , define C(i) = {j ∈ S|i → j}. We prove that the sets C(i) has the required properties. (i) i ∈ C(i) for p0ii = 1 and hence C(i) 6= ∅. If j ∈ C(i) then j is recurrent and j → i. Hence i ↔ j. Thus, any two states in C communicate with each other, i.e., C is irreducible. If k∈ / C(i), then i 6→ k, for otherwise k → i implying k ∈ C. Also for j ∈ / C, i ↔ j and hence j 6→ k for if j → k then i → k. Therefore, C(i) is closed. (ii) If i ∈ C(i1 ) ∩ C(i2 ), then for j ∈ C(i1 ), implying C(i1 ) ⊆ C(i2 ). Similarly, C(i2 ) ⊆ C(i1 ). (iii) is obvious. j ↔ i1 ↔ i ↔ i2 3. Classification of states 36 3.3.16 Theorem (Decomposition of state space): S = ST ∪ SR , where ST consists of all transient states, SR consists of all recurrent states, such that SR = C1 ∪ C2 ∪ ..., consisting of closed irreducible disjoint sets Ci . Proof: Clearly S = ST ∪ SR is possible by definition. Required decomposition follows from proposition 3.3.15. 3.3.17 Note: Thus, we can write the state space as S = ST ∪ C1 ∪ C2 ∪ ... where ST consists of transient states, each Ci is irreducible recurrent. On each Ci chain can be analyzed (irreducible.) If ST is also irreducible closed, we can analyze chain on it separately also. In general, locating a recurrent state in a chain may not be easy. 3.3.18 Some questions: (i) If chain starts in ST , what is the probability that it will stay always in ST ? (ii) Given i ∈ S what is the probability that it will hit a closed irreducible set C of recurrent states and stay in it for ever? Clearly, this probability is if i ∈ C. 1 pc (i) = 0 if i ∈ / C but i is recurrent. So case of interest is for i ∈ ST , what is pc (i)? (iii) Can we have an alternative criterion for a state to be transient or recurrent ? We shall answer some of these in the next section. 3.4. Absorption probability Let C be an irreducible closed set of recurrent states. For i ∈ S, P {system hits C eventually|x0 = i} [ pc (i) = P {Xn ∈ c |x0 = i} n≥0 Note that if i ∈ C, then, pc (i) = 1. If i ∈ / C, but i is recurrent, then pc (i) = 0. So, the problem is to compute pc (i) when i is in ST ? The answer is given by 3.4.1 Theorem: pc (i) satisfy the system of equations for i ∈ ST to go from i to C, we can go either from i to j ∈ C in one step or i to j ∈ ST in one step and then from j to C. X X pc (i) = pij + pij pc (j). j∈C j∈ST Thus, to find pc (i) one has to solve these equations.When ST is infinite, this is not known how to solve these equations. Moreover, solutions need not be unique in that case. When ST is finite , one can show a unique solution exists. We give an example to illustrate this. 3.4. Absorption probability 37 3.4.2 Example Let P = 0 1 2 3 4 5 0 1 1/4 0 0 0 0 1 0 1/2 1/5 0 0 0 2 0 1/4 2/5 0 0 0 3 0 0 1/5 1/6 1/2 1/4 4 0 0 0 1/3 0 0 5 0 0 1/5 1/2 1/2 3/4 . Then as observed in example 3.3.13, C = {0} is a closed irreducible set and ST = {1, 2}. Let us find pc (1), pc (2) for C = {0}. We have to solve X X pij pc (j) pij + pc (1) = j∈c j∈ST = pi 0 + p1 1 pc (1) + p1 2 pc (2). 1 1 pc (1) = + pc (1) + p 1 pc (2) 4 4 2 pc (2) = p2 0 + p2 1 pc (1) + p2 2 pc (2) 2 1 = 0 + pc (1) + pc (2) 5 5 One can solve (3.4) and (3.5) to get pc (1) = 35 , pc (2) = 15 . (3.3) (3.4) 3.4.3 Definition: A Markov chain is called an absorbing chain if (i) It has at least one absorbing state; and (ii) For every state in the chain, the probability of reaching an absorbing state in a finite number of steps is nonzero. Suppose an absorbing markov chain has r absorbing states and a set of transient states grouped as set ST : we can write S = ST ∪ C i ∪ C 2 ∪ . . . ∪ C r , where each Ci is a singleton set corresponding to each absorbing state. Thus, if need be we can renumber the states and assume that the transition matrix has the form absorbing states ← ST → 1↔r ↑ O 1≤i≤r I ↓ P = , ↑ j ∈ ST R Q ↓ where, R is the rectangular submatrix giving transition probabilities from non-absorbing to absorbing states, Q is the square submatrix giving these probabilities from non- absorbing to non-absorbing states, I is an identity matrix, and 0 is a rectangular matrix of zeros. Note that for every n I 0 . Pn = 2 n−1 n (I + Q + Q + . . . + Q )R Q 3. Classification of states 38 n n Thus, if Qn = (qij ), then q1j represents the probability of going from the non-absorbing state i to the non-absorbing state j in n steps. Since the absorption probabilities satisfy: X pC(i) (k), i = 1, ...r, and j ∈ ST , pC(i) (j) = pji + k∈ST we have B = R + QB where, B is the matrix with ij th entry being pC(i) (j). Thus, B = (I − Q)−1 R = N R, where N := (I − Q)−1 , if it exists. Hence, to calculate the absorption probabilities, one has to to show that N exists and calculate (I − Q)−1 . The matrix N = (I − Q)−1 is called the fundamental matrix of the absorbing chain. 3.4.4 Theorem: For every absorbing chain the following holds: n (i) If qij denote the denote the entries of Qn , then the mean absorption tine for a state i is ∞ XX n µi := qij . j∈S m=0 (ii) Qn → 0, as n → ∞. (iii) N := (I − Q)−1 exists. (iv) If B = (I − Q)−1 R = N R = [bij ]i×j , Then bij is the probability that the chain will be absorbed in state j starting from state i. (v) If N = [nij ]i×j , then nij is the expected number of steps needed to go from i ∈ ST to j ∈ ST . Proof: (i) The mean absorption tine for a state i is µi = ∞ X k=1 = = k−1 X m=0 ∞ X P {Starting in state i chain is absorbed at kth step} P {Starting in state i chain is absorbed at kth step} ∞ X m=0 k=m+1 = = ∞ X m=0 ∞ X m=0 = ∞ X m=0 = X P {Starting in state i chain is absorbed at kth step} P {Starting in state i chain is absorbed after m step} P {Starting in state i chain is not absorbed by mth step} X m qij j∈S ∞ X ! m qij . j∈S m=0 n (ii) Note that qij = pn ij for transient states i, j and for i a transient states i, j , n ∞. Hence, qij −→ 0, as n → ∞. P∞ 0 pn ij < 3.4. Absorption probability 39 (iii) Define Nn := I + Q + Q2 + . . . + Qn = n X k+0 It is easy to check that Since, P∞ n 0 Qij Qn , n ≥ 1. Nn (I − Q) = (I − q)Nn = I − Qn+1 , for all n ≥ 1. < ∞ by (ii), and n qij (I − Q) exists. −1 (3.5) −→ 0, we have = N := limn→∞ Nn (iv) Claim is obious. (v) For i, j ∈ ST , let X (k) = Then 1 0 chain is in state j after k steps, starting at i. otherwise. P (X (k) = 1) = k qij P (X (k) = 0) = k 1 − qij k Thus, E(X (k) ) = qij . Hence the expected number of times the chain is in state j in first n steps ,starting in i, is n X k qij . E(X () + X () + . . . + X () ) = k=0 Thus, using Fubini’s theorem, the expected number of times the chain is in state j, starting in i, is E ∞ X X (k) k ! = E limn→∞ n X X (k) k = limn→∞ E n X X (k) k = limn→∞ n X k k qij ! ! ! = ∞ X n qij . 0 The matrix N also helps us to compute ti , the mean(average) number of steps (time) for which the chain P will be in transient states starting from the state i ∈ ST . This is given by ti = j∈ST nij . This is also the absorbtion time starting at i. We apply these to our case of random walk with absorbing barriers (with n + 1 states.) 3.4.5 Example: P = 0 1 .. . .. . n 0 1 q ... ... 0 0 1 0 0 ... ... 0 0 2 0 p ... ... ... 0 ... ... ... ... ... ... ... ... n ... ... 0 ... ... 0 ... ... ... ... q 0 p ... ... 1 3. Classification of states 40 We write this as (interchange 2nd and nth row and 2nd and nth column.) P = R= 0 n .. . .. . n-1 0 .. . .. . p q .. . .. . 0 0 1 0 q ... ... 0 n 0 1 0 ... 0 0 0 0 0 q ... ... , Q = 1 −q I−Q= ... 0 1 −p 1 ... ... 0 q 0 ... ... 0 0 0 −p ... ... ... ... ... p ... q 0 p 0 q ... ... 0 0 ... ... ...0 ... ... ... ... ... ... 0 q 0 p 0 ... q 0 0 ... ... 0 ... ... n − 1 ... 0 ... 0 ... ... p 0 ... 0 p ... 0 q 0 ... ... ... 0 The inverse of I − Q is given by N = (nij ), 0 ... 0 ... p 0 q ... ... 1 q 0 0 0 0 p 0 0 0 −p 1 (i) If p + q, p/q = c,then nij 1 × = n (p − q)(r − 1) (r j − 1)(r n−i − 1) for j ≤ i (r i − 1)(r n−i − r j−1 ) for j ≥ i. (ii) If p=q=1/2, then nij = 2 n j(n − i) for j ≤ i i(n − j) for j ≥ i. And the time to stay in the transient state is n n−1 r − r n−i 1 X [n − i] if p 6= 1/2 p−q nij = ti = rn − 1 j=1 i(n − i) if p = 1/2. From this we can draw the following: Conclusions: • The time ti of stay in transient state starting from i, or equivalently to the time to get out of transient state depends upon i, even when p = 1/2. Note ti = i(n − i) will be maximum when i is in the middle namely i = n/2 (if n is even). Therefore, tmax = (n/2)2 . Thus, when both players have equal probability of win, thee time of ruin is the product of the fortunes of the two players namely i(n − i). And the game will last maximum time when both have equal amount to start with. But if p 6= 1/2, one can show that and for r > 1, imax = logr ((r − 1)n) n − imax , p−q which is of lower order of magnitude compared to the case p = 1/2 case i.e., the game will finish much faster in this case. Next we calculate B = N R. tmax 3.5. More on recurrence/transience 41 Since N is (n − 1) × (n − 1) and R is (n − 1) × 2 matrix, B = (n − 1) × 2 matix.First row gives probability for absorption in n. But pi0 = 1 − pin ∀i. Thus enough to calculate any one one of them. We have n n−1 r − r n−i X if p 6= 1/2 (p/q = r) n nik .rkn = bin = 1 r − i/n if p = 1/2. k=1 (i) If p = 1/2, probability that A is ruined is proportional to the ratio of fortune A starts with to total fortune n (ii) If p > 1/2, i.e., A has an advantage over B, then his chance of ruining his r n − r n−i opponent is . Suppose r = 2, i = 1 then this is rn − 1 2n − 2n−1 2n−1 2n−1 = n = 1/2 (as n → ∞) = 1 − n−1 , n 2 −1 2 −1 2 which is quite good (for example if n = 2, i.e., opponent also has 1 rupee, then this is 2/3 but even for n large this means A can have a good chance of ruining B even if B has large capital and A starts only with rupee 1. Imagine A is a gambling house and B is the player. Gambling house fixes odd states r > 1 and make sure i is large for them. Then always stays in business is approximately limn → ∞(bin ) ≃ 1 − (1/r)i ≃ 1. Note that limn → ∞(bin ) = limn → ∞ r n − r n−i = 1 − (1/r)i rn − 1 ≃ 1. Therefore, probability that the player wins all the money =≃ 1 − (limn → ∞bin ). Thus, gambling houses stay in business no matter how much is bet at the tables. Let us see the absorbtion of the gambling house winning all the money for r near 1, (i.e., odds are favorable to the gambling house but not very much :ti ≃ i(n − i)). Thus, if the gambling house can cover 10, 000 = 104 bets while all gambler put together can provide 106 bets, then i(n − i) = 101 0 units of time, which is very large. So it will take very long time to win all the money and in the mean time more new gamblers would have born. 3.5. More on recurrence/transience 3.5.1 Another way of deciding a state is recurrent/transient LetST denote the collection of all transient states of a system with transition probability matrix P. From P remove the rows and columns for the states not in ST . Let Q be the sub matrix obtained. (Q in general will not be a stochastic matrix.) It will only be non-stochastic matrix. Let Q = (qij )ij ∈ ST . Consider a system of linear equations in variables x1 , x2 , ..., ST = {1, 2, ...} : xi = X k∈T qik xk , 0 ≤ xi ≤ 1, i ∈ ST . 3. Classification of states 42 (i) The maximal solution of the above system are the probabilities that if a system starts at i ∈ ST , then it will stay in that. Thus, maximal xi = P {Xk ∈ ST |x0 = i.} ∀k. (ii) From a transient state, what is the probability it will go in a recurrent state and then stay in there only? Let C be a closed set of recurrent state. The probability yC (i) that the system starting at a state i will reach state C and then forever remains in it. Clearly, if C is irreducible, then yC (i) = 1 if i ∈ C. yC (i) = 0 if i ∈ / C, but i is recurrent. The case of interest is when i is transient. In that case, yC (i) is the minimum non negative solution of the equation. X X p(ij) yC (i). p(ij) + yC (i) = j∈T j∈C Let i0 ∈ S. Consider Ci0 is the closure of the set {i0 .} Let Ci0 = {j1 , j2 , ...} Then, i0 will be transient iff the system of equations P xji = k pji jk xjk 0 ≤ ji ≤ 1 have a non trivial solution. Note xji = 0 ∀i is the only solution. Let us apply this criterion. We give some examples. 3.5.2 Example: Consider the following queueing model. Xn = Number of customers at the counter. ξn = Number of new customers that arrive in nth minute. Each {ξn } can take only three values {0, 1, 2} with probability {α0 , α1 , α2 }, i.e., the distribution of ξn is p{ξn = 0} = α0 , p{ξn = 1} = α1 p{ξn = 2} = α2 α0 + α1 + α2 = 1. Then, S = {0, 1, 2, ...} and transition probability matrix. 0 1 2 3 p00 = α0 = α0 α0 0 0 ... 0 α1 α1 α0 0 ... 1 α2 α2 α1 α0 ... 2 0 0 α2 α1 ... 3 ... ... 0 α2 ... ... ... ... ... ... , Number of customers at n, no new comes p01 = α1 = Number of customers at n, one customer comes p02 = α2 = Number of customers at n, two customer comes If we assume α0 , α2 6= 0, then, the chain is irreducible. We want to know whether it is recurrent or transient. Let us look at state 0. We have to see whether we can find a non-trivial solution of xi = P j6=0 pij xj , 0≤ x1 ≤ 1, i 6= 0. 0 ≤ x1 ≤ 1. 3.5. More on recurrence/transience 43 In our case, equations are x1 = α1 x1 + α2 x2 . xk = α0 xk−1 + α1 xk + α2 xk+1 , k ≥ 2. One can show that a solution is given by (see Billiysley page 126 :) xk = ( B[( α0 k ) − 1], α2 Bk if α0 6= α2 , if α0 = α2 , α0 k for some constant B. Thus, if α0 ≥ α2 , then ( α ) → ∞ and Bk k → ∞. as k → ∞. 2 Hence, non trivial solution exists if α0 < α2 in which case chain will be transient. But transient means with probability 1, the chain must go away from state j. Hence with probability 1 the chain queue size will go to ∞. Not in this case, (α2 − α0 ) > 0 which is the expected increase in queue length. Queue goes to ∞ iff this is > 0.If α0 ≥ α2 , then chain is recurrent i.e., every state will be visited infinitely often. Since S = ST ∪ C1 ∪ C2 ∪ ... We ask the question: Given system starts in ST , what is probability that it will stay in it? The answer is as follows: 3.5.3 Theorem: Let U ⊂ ST and i ∈ U. Then, X = P {Xn ∈ U ∀N ≥ 1|x0 = i}, i ∈ U i are the maximal solutions of the system P xi = j∈U pij xj |i ∈ U 0 ≤ xi ≤ 1. Let us look at an example. 3.5.4 Example(staying in transient states): Consider the unrestricted random walk with transition matrix q 0 p 0 0 ... 0 q 0 p 0 ... 0 0 q 0 p ... ... ... ... ... ... ... We know all the states are transient. Consider U = {0, 1, 2, ...} ⊂ S. We want to know what is the probability that staying at some i ∈ U. It will stay in U. This is given by maximal solution of P xi = j∈U pij xj |i ∈ U 0 ≤ xi ≤ 1. In our case, these are x0 = p00 x0 + p01 x1 = px1 . xi = pii−1 xi−1 + pii+1 xi+1 . (p + q)xi = qxi−1 + pxi+1 . p(xi+1 − xi ) = xi+1 − xi = q(xi − xi−1 ). q (xi − xi−1 ). p 3. Classification of states 44 The only bounded solution for q ≥ p is x0 = 0 = x1 , implying xi+1 = 0 ∀i. Therefore, xi = 0. In this case, probability of staying on non negative side is zero. If x0 q < p, then maximal solution can be found as x1 = . p 2 q q x0 x2 − x1 = (x1 − x0 ) = p p 2 q x2 = x1 + x0 p 2 q x0 x0 + = p p " 2 # 1 q = x0 . + p p Thus, for general n ≥ 1, xn = 1 − Therefore, n q . p xn = P {system stay in (0, 1, 2)|X0 = n} = 1 − As n becomes large, this probability goes to 1. n q p Chapter 4 Stationary distribution for a markov chain 4.1. Introduction Let {Xn }n ≥ 0 be a markov chain and P be it’s transient probability matrix. Let Π0 (i) be its initial distribution. In this chapter, we want to analyze the asymptotic (long run) P behavior of the chain. Suppose there exist {µi }i∈S , such that µi ≥ 0 for all i ∈ S, i∈S µi = 1 and X µi Pij = µj , j ∈ S. (4.1) i∈S Then (4.1) implies that X µi p2ij = i∈S Using induction, for all n ≥ 0, X X µi i∈S X i∈S l∈S pil plj ! = X µl Pljn = µj . l∈S µi pn ij = µj , j ∈ S. In case, µi = Π0 (i) for every i, we have P {Xn = j} = X i∈S Π0 (i)pn ij = Π0 (j), j ∈ S. Thus all Xn′ s have the same distribution. Thus, in some sense, the chain is very stable. 4.1.1 Definition: A markov chain {Xn }n ≥ 0 with P as it’s transient probability matrix and Π0 (i) as its initial distribution is said have a stationary distribution or an invariant P distribution if there exist {µi }i∈S , such that µi for all i ≥ 0, µi = 1, and X µi Pijn = µj , j ∈ S. i∈S Given a markov chain, one would like to answer the following questions: (i) When does the markov chain has a stationary distribution? (ii) How to get the stationary distribution if it exists? (iii) Is stationary distribution unique? 45 4. Stationary distribution for a markov chain 46 (iv) What are the consequences of having a stationary distribution? In the next section we look at the concept of stopping times, needed to answer the above questions. 4.1.2 Example: Consider a markov chain with P = 1 2 3 0 1 0 1 2 0 0 1 1 0 0 3 Intuitively, chain spends one third of the time in state 1, one third of the time in state 2, one third of the time in state 3. In fact, if we take Π0 = (1/3, 1/3, 1/3), then Π0 = Π0 P, i.e., Π0 is a stationary distribution. The chain is irreducible with period 3. 0 if n is not a multiple of 3, pn = ii 1 if n is a multiple of 3. Thus, {pn ii }n≥1 is not convergent. 4.1.3 Example: On a highway, three out of four trucks on the road are followed by a car, while only one out of every five cars are followed by a truck. What fraction of vehicles on the road are trucks? To answer this question, we construct a markov chain as follows: Consider sitting on the side of the road and observe vehicles go by. Then, observation at time n, is 0 Xn = 1, where 0 signifies the appearance of a truck and 1 signifies appearance of a car. Thus the state space is S = {0, 1} with transition matrix P = 0 1 0 1/4 1/5 1 3/4 4/5 If we want each Xn to have same distribution Π0 , then Π0 = Π0 P. Let Π0 = (p0 , p1 ). Then 1/4 3/4 . (p0 , p1 ) = (p0 , p1 ) 1/5 4/5 Therefore, p0 = p1 = 1 = p0 p1 + 4 5 4p1 3p0 + 4 5 p0 + p1 . This implies p0 = 4/19, p1 = 15/19. So, in the long run the fraction of trucks will be 4/19. 4.2. Stopping times and strong markov property Given a markov chain {Xn }n≥1 , let An denote the σ algebra determined by the random variables X0 , X1 , ..., Xn . 4.2. Stopping times and strong markov property 47 4.2.1 Definition: A random variable T : Ω → N ∪ ∞ is called a stopping time if {T = N } ∈ An for all n. 4.2.2 Examples: (i) For i ∈ S, let Si = inf{n ≥ 0|Xn = i} +∞ if such an n exists. otherwise It is a stopping time called the first passage time. (ii) For i ∈ S, let Ti = inf{n ≥ 1|Xn = i} +∞ if such an n exists. otherwise It is a stopping time called the first return to state i. (iii) For A ⊆ S, let inf{n ≥ 1|Xn ∈ A} TA = +∞ if such an n exists. otherwise TA is a stopping time called the first visit to set A. 4.2.3 Note: The event {Tj = n|X0 = i} is starting at i and first visit to j is at time n. In our n earlier notations (see section 3.3,) P {Tj = n|X0 = i} = fij . Thus, fii = ∞ X n=0 fiin = P {Tj < +∞|X0 = i} Thus, a state i is recurrent iff P {Ti < +∞|X0 = i} = 1 and a state i is transient iff P {Ti < +∞|X0 = i} < 1. Let {Xn }n≥1 be a markov chain and T be a stopping time. Let AT = {B ∈ A|B ∩ {T = n} ∈ An ∀n} It is called the stopping tine σ- algebra or the σ- algebra determined by the shopping time. 4.2.4 Proposition (Strong Markov Property): For every A ∈ AT , m > 0, i1 , i2 , ...im ∈ S, P {A ∩ {XT +1 = i1 , XT +2 = i2 , ..., XT +m = im }|XT = i, T < +∞} = P {A|XT = i, T < +∞}P {X1 = i1 , ..., Xm = im |X0 = i} (4.2) Proof: It is enough to prove (4.2) for events A ∈ A of the type A ∩ {T = m} ∈ ∩AT . For such an event A, (4.2) is P {A ∩ {Xn+1 = i1 , ..., Xn+m = im }|Xn = i} = P {A|Xn = i}P {X1 = i1 , ..., Xm = im |X0 = i} (4.3) Now note that by the the markov property,(4.3) holds when A is a simple event in An , determined by and the general event is a countable disjoint union of such events. 4. Stationary distribution for a markov chain 48 4.3. Existence and uniqueness: From now onwards, we shall write Pi (A) = Ei (f ) = P {A|X0 = i} for all events A. Ef |X0 = i for every random variables f. 4.3.1 Theorem: Let {Xn }n≥1 be an irreducible recurrent chain with transition matrix P. Then there exist numbers {rik }k , i ∈ S with the following properties: (i) rkk = 1 for all k ∈ S. X (ii) rik = ∈ Srik pij j ∈ S. i (iii) 0 < rik < +∞ for alli ∈ S. In other words, the chain has an stationary (invariant) measure. Proof: Let for k, i ∈ S, rik := ∞ X Ek (XXn =i,Tk ≥n ) n−1 this represents the total expected number of visits to state i between (any) two visits to state k. Note {Xn = i, Tk ≥ n} represent the event that the chain is in state at time n and not yet visited state k by n. Thus if (k is recurrent),for i = k, the chain visits Xn only once till next visit to k. Thus, rkk = 1∀k ∈ S. This proves (i) Next, for k, j ∈ S, k 6= j rjk = = ∞ X n=1 ∞ X n=1 = = Ek (XXn =j,Tk ≥n ) Pk (Xn = j, Tk ≥ n) Pk (X1 = j, Tk ≥ 1) + pkj + ∞ X n=2 = pkj + X i∈S i 6= k X i∈S i 6= k = pkj + X i∈S k 6= j ∞ X n=2 Pk (Xn = j, Tk ≥ n) Pk (Xn = j, Xn−1 = i, Tk ≥ n) ∞ X Pk (Xn = j, Xn−1 = i, Tk ≥ n) ∞ X Pk (Xn−1 = i, Tk ≥ n)pij . n=2 n=2 4.3. Existence and uniqueness: ∞ X X = pkj + 49 Pk (Xn−1 = i, Tk ≥ n − 1)pij (asXn−1 = i) i ∈ S n=2 k 6= j X rik pij = pkj rkk + = X i i∈S k 6= j ∈ Srik pij . This proves (ii). Finally, since j ↔ k for every j and k ∈ S and chain is irreducible (recurrent), thus there exists m such that pm ki > 0. Thus, m rik > rkk pm ki = pki > 0. (because rkk = 1) Also, 1 = rkk = rik pm ik ∀i. This implies rik < +∞. 4.3.2 Theorem (uniqueness): Let {Xn }n≥1 be an irreducible chain and λ is an invariant distribution for P with λk = 1. Then, λ ≥ rk , where r k is as defined in theorem above. If in addition P is recurrent, then λ1 = rk . Proof: Using invariance of λ, ∀i X λi1 pi1 i. λi = i1 ∈S = = = = = ≥ = X λi1 pi1 i. + pki (becauseλk = 1) i1 ∈ S i1 6= k X X ( ∈ Sλi2 pi2 i1 ) + pki i1 ∈ S i1 6= k X i2 ( X λi2 pi2 i1 + pki1 )pi1 i + pki i2 ∈ S i1 ∈ S i2 = 6 k i1 6= k X λi2 pi2 i1 Pi1 i + ( X i1 ∈ S i1 , i2 ∈ S i1 6= k i1 6= k, i2 6= k −−−− m X X pki1 pin pin−1 in−2 ...pi,i ) ( n=0 i1 ,i2 ,...,in 6=k m X n=0 Pk (Xn = i, Tk ≥ n) pki1 pi1 , i + pki ) 50 4. Stationary distribution for a markov chain Since this holds ∀n, λi ≥ ∞ X n=0 pk {Xn = j, Tk ≥ n} = rik . k In case, the chain is recurrent, we can select n such that pn jk > 0. Let µ = λ − r . Then k µ is also an invariant measure and µk = λk − rk = 0. Thus, X n 0 = µk = µi pn ik ≥ µj pjk ≥ 0. i∈S implies µj = 0 ∀ j, (as pn jk > 0). To go from the invariant measure to a distribution, we need ∞ X rik < +∞. For this we i=0 make the following definition: 4.3.3 Definition: Let {Xn }n≥1 be a chain. (i) Let mi = Ei (Ti < +∞), be the expected return time for state i. (ii) We say a recurrent state i is positive if mi < +∞. Otherwise, we call X recurrent it null recurrent. Note that mi = rji . j∈S We have the following theorem. 4.3.4 Theorem: Let {Xn }n≥1 be an irreducible chain. Then the following are equivalent. (i) All the states are positive recurrent. (ii) There exists a state that is positive recurrent. (iii) There exists an invariant distribution π with the property πi = 1 ∀i. mi Proof: (i) ⇒ (ii) is obvious. (ii) ⇒ (iii): If k is a positive recurrent state, consider rjk , j ∈ S as constructed in first P theorem. Since mk = j∈S rjk is finite, define πi = rik . mk Then, πii∈S is an invariant distribution. P (iii) ⇒ (i): Take any k ∈ S as fixed. Since P is irreducible and i∈S πi = 1, X n πik > 0 for some n πk = i∈S Hence πk > 0 ∀k. Define πi , k ≥ 0. πk Then λi is an invariant measure and λk = 1. Thus, by theorem 4.3.2, λ ≥ rk . Hence, X k X Σπi 1 λi = ri ≤ mk = = < ∞. (4.4) π π k k i∈S i∈S λi = Thus, k is positive recurrent. In fact, P recurrent implies (theorem 4.3.2) that λ = r k . Hence, equation(4.4) says 1 ∀k. mk = λk 4.3. Existence and uniqueness: 51 4.3.5 Example (Random walk on the line): Recall i − 1 ←q i →p i + 1 p 1−p P = . 1−q q (i) The walk is transient if 4pq < 1. (ii) For p = q = 21 : It is called the symmetric random walk, it is recurrent. Consider the measure πi = 1 ∀i ∈ S. Then, 1 1 πi−1 + πi+1 . 2 2 Hence,Π = (1, 1)is an invariant measure. In case, an invariant distribution exists it must be a scalar multiple of π, but Σπi = +∞. Hence, there does not exist any stationary distribution. It is null recurrent. πi = 4.3.6 Example (Asymmetric random walk ): Let pi,i−1 = q < p = pi,i+1 . Though each state is transient and theorem does not apply, let us try to find invariant distribution Π.. For this, ⇔ ΠP = Π ⇔ πi−1 p + πi + 1q = πi Π invariant This gives a recurrence relation, and a general solution can be found: i p πi = A + B , q where A,B are arbitrary constants. This shows that invariant measure need not be unique. 4.3.7 Example(Simple symmetric Random walk on Z2 ): ... ... ... .. . • .. . • .. . • .. . Pij = .. . ... .. . • .. . ... 1/4 ←− ↑ • 1/4 −→ 1/4 ... ↓ • .. . 1/4 ... 2n P00 = ... • ... • .. . ... 1/4 if |i − j| = 1. Then .. . • " 0 otherwise 2n n 2n #2 1 . 2 An intuitive way of seeing this is as follows: Consider Xn+ as orthogonal projection of Xn onto y = x and Xn− as orthogonal projection onto y = −x. Then Xn+ and Xn− 4. Stationary distribution for a markov chain 52 are symmetric independent random variable on Xn = 0 iff Xn+ √ 2 Z and = 0 = Xn− . Now Sterling formula gives 2 A2 n = +∞, that is the symmetric random walk is recurrent. p2n 00 ∼ Hence, P∞ n=0 pn 00 4.3.8 Remark: (i) Similar analysis is possible for random walks in R3 . (ii) For random walks on the line/plane: pn ij → 0; as n → ∞ ∀i, j. 4.3.9 Theorem (Existence for finite state space): Let S be finite and pn ij → πj as n → ∞∀i 4.4. Asymptotic behavior 53 Then (πj )j∈S is an invariant distribution. Proof: Note that X (limn→∞ pn Σj∈S πj = ij ∀i j∈S = limn→∞ ( X pn ij ) = 1, j∈S because P is stochastic. And πj = = n+1 limn→∞ (pn ij ) = limn→∞ (pij X n pik pkj ) limn→∞ ( k∈S = X (limn→∞ pn ik )pkj i∈S = X πk pkj . i∈S Question: When can we generalize above theorem? Some answers are given in the next section. For more details see Billingsley [4] 4.4. Asymptotic behavior 4.4.1 Theorem: Let (Xn )n≥1 be an irreducible aperiodic chain for which a stationary distribution π exists. Then the chain is persistent with limn→∞ pn ij = πj ∀ i, j. Further, all πj > 0 and the stationary distribution is unique. 4.4.2 Theorem: Let Xn )n≥1 be irreducible aperiodic chain for which no stationary distribution exists. Then limn→∞ pn ij = 0 ∀ i, j. 4.4.3 Classification of irreducible aperiodic chains: (i) Transient. P n P n pn ij < +∞. This implies the general fact that limn→∞ pij = 0. n (ii) Recurrent. n pij = ∞. No stationary distribution and by theorem 4, n limn→∞ pij = 0. (iii) ∃ stationary distribution. Hence, positive recurrent. This implies that pn ij → πj > 0. Diagonalization of matrices Let A be a n × n matrix with entries from IF = R or C. A.1.1 Definition: A matrix A is said to be diagonalizable if A is similar to a diagonal matrix, i.e., there exists an invertible matrix P such that P −1 AP is a diagonal matrix. We would like to know when is a given matrix A diagonalizable and if so, how to find P such that P −1 AP is diagonal? Next theorem answers this question. A.1.2 Theorem: Let A be a n × n matrix. If A is diagonalizable, then there exist scalars λ1 , λ2 , . . . , λn in IF and vectors C 1 , C 2 , . . . , C n in IFn be such that the following holds: (i) AC i = λi C i for all 1 ≤ i ≤ n. That is A has n-eigenvalues. (ii) The set {C 1 , . . . , C n } is linearly independent, and hence is a basis of IFn . Theorem A.1.2 says that if A is diagonalizable then not only A has n eigenvalues, it has a basis consisting of eigenvectors. In fact, the converse of this is also true. A.1.3 Theorem: (i) Let A be a n × n matrix. If A has n linearly independent eigenvectors, then A is diagonalizable. (ii) Let A be a n × n matrix. If A has n distinct eigenvalues, then A is diagonalizable . (iii) If A is real symmetric, then there exists an orthogonal matric P such that P AP −1 is diagonal. A.1.4 Note: Theorem 10.1.3 not only tells us when is A diagonalizable, it also gives us a matrix P which will diagonalize A, i.e., P −1 AP = D, and gives us the resulting diagonal matrix also. The columns vectors of the matrix P are the n eigenvectors of A and the diagonal matrix D the diagonal entries as the eigenvalues of A corresponding to these n-eigenvectors. 55 56 Diagonalization of matrices For more details refer ”From Geometry to Algebra- An Introduction to Linear Algebra” by Inder K. Rana, Ane Books, New Delhi, 2010. References Markov chains: [1 ] ’Finite micro chains’ - Kemeny and Sneel Springer-verlag [2 ] ’Introduction to stochastic processes’ - Hoel,Port and Stone Houghton Mifflin Company [3 ] ’A first course in stationary process’ - Karlin and Taylor Academic Press Probability and Measure: [4 ] ’Introduction to probability and measure’ - P.Billingsley. [5 ] ’Introduction to probability and measure’ - K.R. Parthasaathy. Measure theory: [6 ] ’An Introduction to Measure and Integration’ - Inder K. Rana. Narosa Publishers Linear Algebra [7 ] ’From Geometry to Algebra- An Introduction to Linear Algebra’ - Inder K. Rana, Ane Books, New Delhi, 2010. 57 Index Conditional impendence, 2 absorbing chain, 37 absorbing state, 7, 23 aperiodic chain, 27 asymmetric random walk, 51 Bay’s formula, 2 birth and death chain, 9, 28 chain rule, 2 closed set, 23 communicate, 25 communicating class, 25 states, 25 countably additive, 1 distribution nth stage, 16 Ehrenfest diffusion model, 11 events, 1 expected return time, 50 absorbing barriers, 7 asymmetric, 51 on line, 51 unrestricted, 7 random wlak symmetric on plane, 51 reachable state, 23 recurrent, 30 space, probability, 1 state space, 4 decomposition, 36 stationarity property, 4 stochastic matrix, 20 stopping time, 47 strong markov property, 47 stsionary distribution, 45 transient, 30 transition graph, 5 transition matrix, 5 transition probability nth stage, 16 single step, 4 first passage time, 47 first return time, 47 first visit time, 47 fundamental matrix, 38 graphical representation, 5 initial distribution, 5 invariant distribution, 45 irreducible set, 24 markov chain, 5 markov property, 4 null recurrent, 50 period, 27 persistent, 30 positive recurrent, 50 probability, 1 probability space, 1 random walk reflecting barriers, 8 random walk 59