Download hku m01

The Outline Part I aims at providing the fundamental knowledge in probability theory, Poisson process and Markov chain theory. Some applications such as inventory systems and PageRank algorithm are discussed. (1) Probability Theory, Poisson Process and Inventory System (2) Discrete Time Markov Chains : Theory and Examples Part II discusses a continuous time stochastic process, the Birth and Death process and its relation to Markovian queueing systems. Applications of Markovian queueing systems will also be discussed. (3) Continuous Time Markov Chains : Theory and Examples (4) Introduction to Queueing Systems http://hkumath.hku.hk/∼wkc/course/part1.pdf 1 In Part III, we introduce iterative methods (computational methods) for solving a system of linear equations. It is important for solving the steady-state distribution of a queueing network. (5) Computation with Markov Chains: Iterative Methods (6) Markovian Queueing Networks, Manufacturing and Re-manufacturing Systems Finally in Part IV, we introduce four research topics related to Markov chain models. (7) Hidden Markov Models (HMMs) and Their Applications (8) Multivariate and High-order Markov Chain Models with Applications (9) Introduction to Credit Risk Models (10) Probabilistic Boolean Networks: Construction & Application 2 What is Engineering? • Engineering combines the fields of science (Physics) and mathematics to solve real world problems that improve the world around us. • What really distinguishes an engineer is his ability to implement ideas in a cost effective and practical approach. • The ability to take a thought, or abstract idea, and translate it into reality is what separates an engineer from other fields of science and mathematics. [Taken from http://whatisengineering.com/] 3 Mathematicsisaboutproof. Proof 䅹᰾ 䀰 ⲫ ᰕ ᴸ Words Goupto Sun&Mon Figure 1: What is Mathematics? What is Mathematics? • Mathematics is a language. Language is the dress of thought (Samuel Johnson). Moreover, the limits of my language are the limits of my world (Ludwig Wittgenstein). 4 Transferable Skills • Skills that can be used in a variety of jobs or occupations. • Transferable skills are the inventory of assets that help you transition into and excel in a new role. They ensure the robustness and longevity of your career. Moreover, they allow you to more easily and readily explore lateral dimensions in your career and acquire added skills and expertise. • While highly specialized skills may be essential to building your own personal competitive advantage and ensuring success in a particular role or organization, it is the transferable skills that ensure you DO NOT become professionally redundant or obsolete over the long term. [Taken from http://www.bayt.com/en/career-article-2721/] 5 Transferable Mathematical Knowledge • Stochastic Processes: Markov Chain Process, Poisson Process, Birth-and-Death Process, Diffusion Process, Levy Process, etc. • Mathematical Models: Deterministic Models, Stochastic Models, Statistical Models, etc. • Computational Methods: Iterative Methods, Numerical Algorithms, Computer Simulations, etc. • Data Mining Techniques: Data Modeling, Classifications, Clustering Methods, etc. 6 PART I (1) Probability Theory, Poisson Process and Inventory Systems -Exponential Distribution and Poisson Distribution -Poisson Process -Introduction to Inventory Models (2) Discrete Time Markov Chain Models: Theory and Examples -Introduction to Markov Chain -Irreducible Markov Chain and Classifications of States -Simulation and Construction of Markov Chains -Stationary Distribution of a Finite Markov Chain -More Markov Chain Models We are to admit no more causes of natural things than such as are both true and sufficient to explain their appearances. Isaac Newton (Wikipedia) 7 Figure 2: St Augustine and Monica by Ary Scheffer (1846). Taken from Wikipedia. 1 Probability Theory, Poisson Process and Inventory Systems 1.1 The Science of Probability: A Historical Note • With the advent of Christianity, the concept of random events developed by philosophers was rejected in the early time. • According to St. Augustine (354-430), nothing occurred by chance, everything being minutely controlled by the will of God. If events appear to occur at random, then it is because of our ignorance and not in the nature of events. One should only seek for the will of God instead of looking at patterns of behavior in aggregates of events. 1 1 Poker faces: the life and work of professional card players by David M. Hayano, UCP Press, 1982. 8 • The amazing contents and applications of probability theory owes its origin to a question on gambling (game). • The question was raised by Chevalier de Mere (1607-1684) on his problem of throwing a die. He had a title Chevalier (Knight) and educated at Mere. The problem was solved by Pascal. The Problem: • De Mere made considerable money over the years in betting double odds on rolling at least one “6” in 4 throws of a fair die (six faces). • He then thought that the same should occur for betting on at least one doublesix in 24 throws of two fair dice (This is their ancient believes). It turned out that it did not work well. • Why? In 1654, he challenged his friends Pascal and Fermat for reasons. 9 Figure 3: Pascal (1623-1662) (Left). Fermat (1601-1665) (Right). Taken from Wikipedia. 10 • The probability of getting no “6” in four independent throws of a fair die: 625 . 1296 • Therefore the probability of having at least one “6” in 4 throws will be equal to (5/6) × (5/6) × (5/6) × (5/6) = 671 625 = = 0.5177 > 0.5000. 1296 1296 • This explained why de Mere got a good amount of money on double odds on his bet. 1− • This is not a fair game, the player has advantage over the house. 11 • The probability of getting no double “6”in the throw of two fair dice is ( ) 1 1 35 1− × = . 6 6 36 • The probability of getting no double “6” in “24” independent throws is ( )24 35 . 36 • Therefore the probability of having at least one double 6 in 24 throws is equal to ( )24 35 1− = 0.4914 < 0.5. 36 • This explained why de Mere did not get a good amount of money on double odds on this bet. • Again this is not a fair game, the house has advantage over the player. 12 1.2 Poisson Process, Reliability Theory and Inventory Systems • Everyday we encounter with a lot of random events. The waiting time for a bus, the service time at a counter, the lifetime of a machine, the processing time for a product etc. The waiting time for the occurrence of an event can be modeled by a continuous probability distribution. • On the other hand, there are other events such as: the number of buses passes through a junction in one hour, the number of customers served per day, the number of broken machines handled in a year and the number of products produced per month. The number of events occurs in a fixed period can be modeled by a discrete probability distribution. 13 • Here we propose to employ the Exponential distribution (continuous case) f (t) = λe−λt, t ≥ 0, λ > 0. and the Poisson distribution (discrete case) (λt)n −λt f (n, t) = e ; λ > 0, n! to model the above random events. t ≥ 0, n = 0, 1, . . . . • These two distributions are linked up by the Poisson process. We will also discuss the relationship in detail. 14 1.3 Exponential Distribution and Poisson Distribution 1.3.1 Exponential Distribution Definition 1. A continuous random variable X is said to follow an exponential distribution with parameter λ, if its probability density function is given by { −λx λe if x ≥ 0 f (x) = 0 if x < 0. Proposition 1. If a random variable X follows the exponential distribution with parameter λ then E(X) = λ−1 and V ar(X) = λ−2 and its moment generating function is given by λ(λ − t)−1. 15 Proof. First by definition, we have ∫ ∞ E(X) = xf (x)dx = 0 ∫ and ∞ 2 E(X ) = x2f (x)dx = 0 1 λ 2 . 2 λ Therefore we have 2 1 1 − = . 2 2 2 λ λ λ Finally the moment generating function is then given by ∫ ∞ λ M (t) = E(etX ) = etxλe−λxdx = . λ−t 0 We note that M ′(0) = λ−1 = E(X) and M ′′(0) = 2λ−2 = E(X 2). V ar(X) = E(X 2) − E(X)2 = Remark 1. We note that ∫ ∞ E(g(X)) = −∞ g(x) · f (x)dx. There is a one-to-one relation between the Probability Density Function (PDF) and its Moment Generating Function (MGF). 16 Definition 2. A probability distribution (let say a non-negative random variable X) is said to have the Markov property if for any two non-negative values t and x we have P {X > x + t|X > t} = P {X > x}. Proposition 2. The exponential distribution has the Markov property. Proof. This follows from e−µ(t+x) P {X > t + x|X > t} = e−µt −µx = e = P {X > x}. (1.1) • In a number of applications, observation has shown that the exponential distribution can be a good description of service time distribution (which is therefore called exponential service time). • Exponentially distributed time has the nice feature that by the Markov property (1.1), the distribution of the remaining holding time after a customer has been served for any length of time t > 0 is the same as that initially at t = 0. 17 Here are some more properties of the exponential distribution. Proposition 3. If X1 (the waiting time for Bus 1) and X2 (the waiting time for Bus 2) are independent non-negative random variables with probability density functions f1(t) and f2(t). If they are both exponentially distributed random variables with means µ−1 i , then the probability that X2 exceeds X1 is µ1/(µ1 + µ2). Proof. ∫ P {X2 > X1} = ∞∫ ∞ ∫ ∞∫ t f1(s)f2(t)dtds = 0 s f1(s)f2(t)dsdt 0 0 i.e., we integrate the joint density function over the region Ω = {(s, t) ∈ R2|t > s}. −1 If X1 and X2 are exponential with mean µ−1 1 and µ2 , then the above integral becomes ∫ ∞∫ ∞ ∫ ∞ µ1 −µ1 s −µ2 t −µ1 s −µ2 s . µ1e µ2e dtds = µ1e e ds = (1.2) µ + µ 1 2 0 s 0 18 Proposition 4. Suppose that X1, X2, . . . , Xn are independent, identical, exponential random variables with mean µ−1, and consider the corresponding order statistics X(1) ≤ X(2) ≤ · · · ≤ X(n). Then X(1) is again exponentially distributed with mean n1 times the mean of the original random variables. Proof. We observe that X(1) = min(X1, X2, . . . , Xn). • We note X(1) > x if and only if all Xi > x (i = 1, 2, . . . , n). Hence we have P {X(1) > x} = = = P {X1 > x}P {X2 > x} · · · P {Xn > x} (e−µx)n e−nµx. 19 Proposition 5. A random variable X is exponentially distributed if and only if P {X < t + h|X > t} = µh + o(h) as h → 0. Here o(h) is a function of h such that o(h) = 0. lim h→0 h Proof. Suppose X has an exponential distribution, P {X < t + h|X > t} = 1 − e−µh (Markov property) (1.3) (Use Taylor’s series, see Remark 2) = 1 − (1 − µh + o(h)) = µh + o(h) as h → 0. • Conversely, suppose that P {X < t + h|X > t} = µh + o(h) as h → 0 then we have P {X > t + h|X > t} = 1 − µh + o(h). 20 Using P {X > t + h} P {X > t + h|X > t} = , P {X > t} re-arranging the terms and let h → 0, one can obtain the differential equation (see Remark 3) d P {X > t} = −µP {X > t}, dt which has the unique solution P {X > t} = e−µt satisfying the initial condition P {X > 0} = 1, i.e. X follows the Exponential distribution. 21 Remark 2. For 0 < h < 1 we have 2 3 µh (µh) (µh) 1 − e−µh = 1 − (1 − + − + . . .) 1! 2! 3! 2 3 µh (µh) (µh) = − + − ... 1! 2! 3! ≡ µh + g(h). Then we see that ( ) 2h 3 h2 g(h) (µ) (µ) = − + − . . . h 2! 3! ( ) (µ)2 (µ)3 ≤ h + + ... 2! 3! = heµ. Therefore g(h) lim = 0 and hence g(h) = o(h). h→0 h 22 Remark 3. Given P {X < t + h|X > t} = µh + o(h) as h → 0 then we have P {X > t + h|X > t} = 1 − µh + o(h). Using P {X > t + h} P {X > t + h|X > t} = P {X > t} we have P (X > t + h) = P (X > t) − µhP (X > t) + o(h)P (X > t). 23 Let F (t) = P (X > t) then we have F (t + h) = F (t) − µhF (t) + o(h)F (t) = F (t) − µhF (t) + o(h). • Thus F (t + h) − F (t) o(h) = . − (−µF (t)) h h Hence F (t + h) − F (t) o(h) = 0. lim − (−µF (t)) = lim h h→0 h→0 h • This means F ′(t) = −µF (t). And F (x) can be solved easily with F (0) = 1. • Solving the differential equation, we have F (t) = e−µt. 24 1.3.2 Poisson Distribution The Poisson distribution is a discrete distribution given by (λt)j −λt Pj (t) = e j = 0, 1, 2, . . . . j! Proposition 6. If a random variable X follows the Poisson distribution then E(X) = λt and V ar(X) = λt and its moment generating function is z) −λt(1−e e . 25 Proof. We have E(X) = = ∞ ∑ ( (λt)j −λt e j! j j=0 ∞ ∑ j=1 • Since E(X(X − 1)) = = ) (λt)j−1 −λt (λt) e = λt. (j − 1)! ∞ ∑ ( j(j − 1) j=1 ∞ ∑ j=2 (λt)j −λt e j! j−2 (λt) (λt)2 e−λt (j − 2)! = (λt)2. We have E(X 2) − E(X) = (λt)2. 26 ) • Hence we have V ar(X) = E(X 2) − E(X)2 = (λt)2 + E(X) − E(X)2 = λt. • Finally the moment generating function is given by ) ( ∞ j ∑ (λt) E(eXz ) = e−λt ejz j! j=0 ( ) ∞ ∑ (λtez )j = e−λt j! j=0 z −λt+λte = e z) −λt(1−e = e . 27 Proposition 7. Sum of two independent Poisson variables is again a Poisson variable. Proof. Let X = X1 + X2 where Xi is a Poisson random variable with mean λit. Then X has the moment generating function z ) −λ t(1−ez ) z) −λ t(1−e −(λ +λ )t(1−e 1 2 1 2 g(z) = e e =e . This shows that the sum of two independent Poisson random variable with means λ1t and λ2t is itself a Poisson random variable with mean (λ1 + λ2)t. Remark 4. Let gi(z) be the moment generating functions of Xi. If X1 and X2 are independent, then the moment generating function of X1 + X2 is given by E(et(X1+X2)) = E(etX1 · etX2 ) = E(etX1 ) · E(etX2 ) = g1(z) · g2(z). 28 1.4 Poisson Process Definition 3. A process P (t) is said to be a Poisson process if at any epoch t, P {one occurrence during (t, t + h)} = λh + o(h) as h → 0 and P {two or more occurrence during (t, t + h)} = o(h) as h → 0. Remark 5. The following are some remarks on the notation o(t). (a) We recall that o(h) is a function of h such that o(h) lim = 0. h→0 h (b) Possible examples of o(h) are o(h) = h2 and o(h) = h sin(h). √ (c) However, o(h) cannot take the form h or h log(h). 29 Poisson process, Poisson distribution and exponential distribution are related as follows: Proposition 8. Suppose in a certain process, we let Ti (i = 1, 2, 3, · · · ) be the epoch of the ith occurrence. Let Ai = Ti − Ti−1 (i = 1, 2, 3, · · · ); T0 = epoch that we start to count the number of occurrences. Let X(t) = number of occurrences in a time interval of length t. Then the following statements are equivalent. (a) The process is Poisson (with mean rate λ). (b) X(t) is a Poisson random variable with parameter λt, i.e. (λt)j −λt P {X(t) = j} = e , j = 0, 1, 2, · · · . j! (c) Ai’s are mutually independent identically distributed exponential random variables with mean λ−1, i.e. P {Ai ≤ t} = 1 − e−λt , i = 1, 2, · · · . 30 (a) implies (b) • Given a Poisson process, with mean rate λ, we want to find the number of occurrences in the time interval [0, t] and the probability density of the time between two successive occurrences. • Divide the time interval into n equal parts and each is of length t/n. In each of the sub-interval, the probability that it contains an occurrence is given roughly by λh = λt/n. Therefore the probability that one get k occurrences is given by the Binomial distribution ( ) ( )k ( )n−k ( )n−k k λt λt n · · · (n − k + 1) (λt) λt n · . 1− = · 1− k k n n n n | {z } k! | {z } →1 →e−λt • By letting n goes to infinity, from the well-known result, we have e−λt(λt)k the Poisson distribution: . k! 31 (b) implies (a) We note that P (k = 0) = e−λt = 1 − λt + o(t) (by Remark 2) and P (k = 1) = e−λtλt (λt)2 = λt(1 − λt + 2! + · · · +) (λt)3 2 = λt + (−(λt) + 2! + · · · +) = λt + o(t). Therefore we have P (k > 1) = 1 − P (k = 0) − P (k = 1) = o(t). 32 (a) and (b) implies (c) Let f (t) be the probability density of the inter-occurrence time T . Then by definition we have P (t ≤ T ≤ t + δt) = f (t)δt = P (no occurrence in (0, t)) × P (1 occurrence in (t, t + δt)). Now we have P (no occurrence in (0, t)) = e−λt and P (one occurrence in (t, t + δt)) = λδt. Hence we have and f (t)δt = λe−λtδt f (t) = λe−λt. 33 (c) implies (b) We will prove by using mathematical induction that the PDF of the random variable A(n) = A1 + A2 + . . . + An is given by the Erlangian distribution: λ(λt)n−1e−λt gn(t) = . (n − 1)! For n = 1, it is clear the proposition is true and we assume the statement holds for some positive integer n. We then look for the PDF of Z = An + A. ∫ t∫ We have P (Z ≤ t) = ∫0 t t−x gn(x)λe−sλdsdx 0 gn(x)(1 − e−λ(t−x))dx ∫0 t ∫ t λ(λx)n−1 −λt dx = gn(x)dx − e (n − 1)! 0 ∫0 t n −λt (λt) = . gn(x)dx − e n! 0 = 34 • Taking the derivative of the R.H.S. with respect to t, we obtain the PDF of Z as follows: λ(λt)ne−λt gn+1(t) = . n! We have the following holds for n = 1, 2, . . . , ∫ t λ(λx)n−1e−λx P (A(n) ≤ t) = dx (n − 1)! 0 n−1 k ∑ −λt (λt) = 1− e (apply integration by parts) k! k=0 n−1 ∑ = 1− P (X(t) = k). k=0 • Here P (A(n) ≤ t) is the probability that the waiting time for the nth arrivals is less than or equal to t. • This is also the probability that there are n or more arrivals observed in the interval [0, t]. Thus X(t) follows the Poisson distribution with mean rate λt. 35 Remark 6. Poisson process provides rather good approximation for modeling many random processes such as the arrival of customers and calls. From the proposition above, a process is Poisson (with coefficient λ) if and only if the inter-arrival times (the lengths of time between successive customer arrivals) are mutually independent exponentially distributed random variables with mean λ−1. • We note that if the mean arrival rate is λ then the mean inter-arrival time is λ−1. By the Markov property of exponential random variables, the distribution of lengths of time from an arbitrarily chosen epoch to the next arrival (called the next-arrival times) is the same as the distribution of inter-arrival times. Observer’s Sampling Point ↓ ↓ ↓ B - A Distribution of A = Distribution of B Figure 1.1. Arrival of Customers. • This nice property much simplifies the mathematical analysis of many situations. 36 • For a Poisson process, P {exactly one occurrence in(t, t + h)} = λhe−λh. Now for a fixed t and any x in (0, t), P {the epoch of occurrence is in (0, x)| exactly one occurrence in (0, t)} P {exactly one occurrence in (0, x), and no occurrence in (x, t)} = P {exactly one occurrence in (0, t)} λxe−λx × e−λ(t−x) = −λt λte x = , t which is a uniform distribution. • This means that if we know there is exactly one occurrence in (0, t) then the epoch of that occurrence is equally likely throughout (0, t). In this sense we say that a Poisson process is random. 37 1.5 Law of Large Numbers and Central Limit Theorem In this section, we first introduce without prove the famous law of large numbers which states that the long-run average of a sequence of independent and identically distributed random variables will converge to its mean. We then prove the weak law of large numbers. We state without proving the strong law of large numbers. Finally, we will state and prove another important theorem namely the central limit theorem. • Instead we shall state and give a proof for the weak law of large numbers. • At the same time we also introduce two important inequalities for random variables: Markov inequality and Chebyshev inequality. • By making use of the Chebyshev inequality, one can prove the weak law of large numbers. • Let us begin our proof with the Markov inequality. 38 Proposition 9. If a random variable takes X only non-negative values, then for any a > 0, we have E(X) . a This is known as the Markov inequality. Proof. We give a proof for the case when X is a continuous random variable. The case for discrete random variable is similar and therefore omitted. Let f (x) be the probability density function. ∫ ∞ E(X) = xf (x)dx ∫0 a ∫ ∞ = xf (x)dx + xf (x)dx a ∫0 ∞ ≥ xf (x)dx ∫a ∞ ≥ af (x)dx (because xf (x) ≥ af (x), for x ≥ a) a∫ ∞ = a f (x)dx = aP (X ≥ a). P (X ≥ a) ≤ a 39 • Using the Markov inequality, one can also prove the Chebyshev’s inequality. Proposition 10. If a random variable X has mean µ and variance σ 2 then for any k > 0, we have 1 P (|X − µ| ≥ kσ) ≤ 2 . k This is known as the Chebyshev’s inequality. Proof. Let (X − µ)2/σ 2 be a non-negative random variable whose mean is ] [ 2 (X − µ) E[(X − µ)2] σ 2 = E = 2 = 1. σ2 σ2 σ Let a = k 2 and apply the Markov inequality we have ( ) 1 (X − µ)2 2 P ≥ k ≤ 2. σ2 k Hence we get 1 P (|X − µ| ≥ kσ) ≤ 2 . k 40 Proposition 11. Let X1, X2, . . . , Xn be a sequence of independent and identical distributed random variables having mean µ and finite variance σ 2. Then for any e > 0 we have ( ) X1 + X2 + . . . + Xn lim P − µ ≥ e = 0. n→∞ n This is the famous Weak Law of Large Numbers. Proof. Let 1 X̄ = (X1 + X2 + . . . + Xn) n we have E(X1) + E(X2) + . . . + E(Xn) µ + µ + . . . + µ E(X̄) = = =µ n n and Var(X1) + Var(X2) + . . . + Var(Xn) σ 2 + σ 2 + . . . + σ 2 σ 2 Var(X̄) = = = . 2 2 n n n By Chebyshev’s inequality for any positive k we have kσ 1 P (|X̄ − µ| ≥ √ ) ≤ 2 . k n 41 √ In particular we let k = e n/σ and get σ2 P (|X̄ − µ| ≥ e) ≤ 2 . ne Therefore for any positive e we have lim P (|X̄ − µ| ≥ e) = 0. n→∞ Here let us also state without prove the strong law of large numbers. A generalization of the weak law of large numbers is that, with probability one, lim X̄ = µ. n→∞ This means that the long-run average of a sequence of independent and identically distributed random variables will converge to its mean. Proposition 12. Let X1, X2, . . . , Xn be a sequence of independent and identical distributed random variables having mean µ and finite variance σ 2. Then we have ( ) X1 + X2 + . . . + Xn P lim = µ = 1. n→∞ n This is the famous Strong Law of Large Numbers. 42 Proposition 13. (Central Limit Theorem)Let X1, X2, . . . , Xn be a sequence of independent, identically distributed random variables with mean µ and variance σ 2. Then the following random variable tends to the normal distribution with mean 0 and variance 1 as n → ∞: X1 + X2 + . . . + Xn − nµ X̄n − µ √ √ . = Zn = σ n σ/ n Proof. Here we give a heuristic proof to show that the moment generating function of Zn tends to the moment generating function of the standard normal N (0, 1) random variable, i.e., 2/2 t lim gZn (t) = e . n→∞ 43 • Now it is easy to check that E(Zn) = 0 and E(Zn2 ) = 1. We have √ X +X +...+X −nµ t( 1 2 σ√n n ) gZn (t) = E(etZn ) = E(et(X̄n−µ)/(σ/ n)) = E(e t(Xi−µ) i−µ) √ √ ∏n t(X ∏ n t n σ n σ n )= g = E( i=1 e ) = i=1 E(e X−µ ( √n ) ) σ where g X−µ (.) is the moment generating function of the random variσ X−µ able X−µ , the normalized form of X. As mean and variance of σ σ are 0 and 1, respectively. • We write 1 2 g X−µ (t) = 1 + 0t + t + m3t3 + m4t4 + · · · + 2 σ and therefore ( ) t t2 m3 t 3 log(gZn (t)) = n log(g X−µ ( √ )) = n log 1 + + (√ ) + · · · + . 2!n 3! n n σ 44 • Thus we have log(gZn (t)) ≡ n log(1 + a(t)). Since we have the Taylor’s series b2 b3 log(1 + b) = b − + − · · · |b| < 1. 2 3 Now by choosing t close to zero, we have |a(t)| < 1 and we have ( ) ( ) 1 2 1 3 log gZn (t) = n log(1+a(t)) = n (a(t) − a (t) + a (t) − · · · . 2 3 • By inspection, we have t2 lim na(t) = and lim nak (t) = 0 for k = 2, 3, . . . . n→∞ n→∞ 2 Thus t2 t2 lim log(gZn (t)) = . or lim gZn (t) = e 2 . n→∞ n→∞ 2 45 1.6 Introduction to Inventory Models • To meet demand on time and compete in the market, a company needs to keep suitable amount of stock in hand and order replenishment at a right time. Too much stock incurs extra inventory cost but shortage in stock will result in unsatisfied demands. • The purpose of inventory theory is to determine rules or policies so as to minimize the system running cost (or maximize its profit) and meet the customer demand. • To obtain “good” operational policy, one has to understand the process of demand, flow of material and the cost of the facility etc. Mathematical models, especially probabilistic models are good tools for the analysis of inventory process. • The main objective here is to obtain optimal ordering policies for inventory systems such that the overall system running cost is minimized. We will begin with simple deterministic models and then complex probabilistic models. 46 The following are possible costs associated with an inventory system. 1. Ordering and setup cost: it includes the cost of paperwork and billing associated with an order. If the product is produced internally, this cost may also include the cost for setting up a machine in a production system and also the cost for labor. 2. Purchasing cost: it includes the cost of raw material and transportation cost. 3. Holding cost: this is the cost of holding an unit of inventory for one period. If the period is one year than it is the annual holding cost. This cost may also include the insurance cost due to the possibility of spoilage. 4. Shortage cost: when a demand cannot be met in time, a shortage is said to be occurred. There are two possible cases, either (i) the customers accept delivery on a later date, this is called a backlogged demand or (ii) the customers refuse to have the delivery on a later date and this is called a lost sale. 47 1.6.1 Deterministic (EOQ) Inventory Models The followings are the basic assumptions of the deterministic inventory Models. 1. Repetitive ordering: This means that the ordering decision is repetitive in the sense that it is repeated in a regular manner. 2. Constant demand: Demand is assumed at a constant rate, let say D per year. Then the total demand for t years will be Dt. 3. Constant lead time: The lead time for each order is a known constant L. Definition 4. The lead time is the length of time between the instance when an order is placed and the instance at which the order arrives. 48 1.6.2 The Basic EOQ Model The basic assumptions of a Economic Order Quantity (EOQ) Model are as follow. 1. The demand D per year (we assume the unit of time is one year) is a known constant, 2. The ordering cost is K when an order is placed, 3. The size of an order q is a constant to be determined, 4. The lead time L is assumed to be 0, 5. No shortage of demand is allowed, 6. The holding cost for one unit of inventory is h per year, 7. The unit purchasing cost is p. Remark 7. We note that the orders arrive instantaneously, in an optimal policy one should never place an order when the inventory level is greater than zero or otherwise an unnecessary holding cost will be incurred. When the inventory level is zero, one must place an order to prevent a shortage from occurring. The inventory level of the system is given in Figure 1.5. 49 6 Inventory Level •@ q− @ @ @ @ @ @ @ @ @ @ @ | q D •@ @ @ @ @ @ @ @ @ ··· ··· ··· | 2q D @ @ @ @ @ @ @ | 1 @ @ - t Figure 1.5. The Basic EOQ Model. Definition 5. Any interval of time that begins with the arrival of an order and ends with the instant before the next order is received is called a cycle. Remark 8. We note the cycle of this model is q/D. • We are going to determine the optimal value of q such that the total annual running cost of the system is minimized. • The annual running cost c(q) can be written as follows: cost of placing order + purchasing cost + holding cost. 50 (a) Since the size of each order is q units and the annual demand is D, one needs D/q orders per year. Thus the annual cost of placing order is given by D K( ). q (b) The annual purchasing cost is of course given by pD. (c) In each cycle, inventory level decreases from q down to 0 at a linear rate of D. Thus the average inventory level is given by · q/D · q q = . q/D 2 The annual holding cost is given by q h( ). 2 • From (a)-(c), the annual cost is given by 1 2 KD hq c(q) = + pD + . q 2 51 (1.4) • We note that −KD h + . 2 q 2 The only critical point is obtained by solving c′(q) = 0 and We have √ 2KD q∗ = . h Since 2KD c′′(q) = 3 > 0 for q > 0 q the annual cost c(q) is minimized when q = q ∗. The optimal size of an order √ 2KD ∗ q = h c′(q) = is called the Economic Order Quantity (EOQ). • In this case the optimal annual cost is given by √ √ h 2KD √ h c(K, D, h) = KD + pD + = 2hKD + pD. 2KD 2 h 52 (1.5) Remark 9. We give some remarks on the EOQ Model. 1. The EOQ does NOT depend on the unit purchasing price, because no matter what is the size of each order, one has to purchase D units per year. 2. From the formula, we note that EOQ increases when the annual demand D or the ordering cost K increases. 3. If the unit holding cost increases, then the EOQ decreases and the system holds less inventory. 4. We also note is that when EOQ is applied, the annual holding cost is given by √ √ hKD hq h 2KD = = 2 2 h 2 and the annual ordering cost is given by √ √ KD h hKD = KD = . q 2KD 2 Therefore the annual holding cost is equal to the annual ordering cost. 53 Example 1. A company uses 500 units of drug per year. The cost for the ordering of drug is 20, each unit of drug cost 100 and the holding cost for one unit of drug is 2 per year. (a) What is the EOQ and the optimal running cost? (b) How many order will be placed each year? (c) What is the cycle length? We note that in this case K = 20, h = 2 and D = 500. (a) The EOQ will be √ √ 2KD 2(20)500 = = 100. h 2 The optimal running cost is √ √ 2hKD + pD = 2(2)(20)(500) + 100(500) = 50200. (b) The number of orders will be 500/100 = 5. (c) The cycle length will be 1/5 year. 54 Example 2. In the previous example, if the drug is rotten and cannot be used if it spends more than one month in the inventory. What is the EOQ, optimal running cost and the cycle length in this case? • In previous example the cycle length is 2.4 month which is longer than one month. Thus the new cycle length is one month (1/12 year). • The new EOQ is D/12 = 500/12 and the optimal running cost is hq KD + pD + q 2 2(500/12) 20(500) + (100)500 + = 500/12 2 = 50281.7. 55 1.6.3 When EOQ is Applicable • One main assumption on the deterministic EOQ model is the constant demand. • To determine if the assumption of constant demand is reasonable, suppose that the demand during the n periods of time D1, D2, · · · , Dn are observed. Let the mean of the observed demands be ) ( n ∑ 1 D̄ = Di n i=1 and the variance of the observed demands ( nbe ) 1 ∑ 2 Di − D̄2. Var(D) = n i=1 • We define coefficient of variation as √ Var(D) . CV = D̄ Clearly if CV = 0, then the demand is constant. In general if CV < 0.2 then the assumption of constant demand is reasonable. 56 1.6.4 Newsboy Model with Discrete Stochastic Demand In the previous section we studied some inventory models where the demand is a known constant. In this section we will discuss inventory models with stochastic (uncertain) demand. In our discussion, stochastic (uncertain) means that the demand is no longer constant but follows certain known probability function. We begin our discussion with the newsboy model. • A newsboy sells newspaper (perishable) every morning. The cost of each newspaper remains at the end of the day is Co (overage cost) and the cost of each unsatisfied demand is Cu (underage cost or shortage cost). • Suppose that the probability density function of the demand D is given by Prob(D = i) = pi ≥ 0, i = 0, 1, 2, · · · . • Our aim is to determine the best amount Q of newspaper to be ordered such that the expected cost is minimized. 57 • Consider the following newsboy problem. If the daily demand is given as follows: x 0 1 2 3 4 P (x) 0.05 0.2 0.3 0.4 0.05 Table 1.2. • Suppose the unit underage cost and the overage cost are the same, what is the optimal ordering quantity Q∗ ? • It is clear that the optimal order quantity Q∗ should be in {0, 1, 2, 3, 4}. Most likely Q∗ ̸= 0 and Q∗ ̸= 4. It is possible that Q∗ = 1, 2, 3. Most likely Q∗ should be 3 as P (x = 3) = 0.4, it has the highest probability. Do you agree? You will find out the answer yourself later. 58 Proposition 14. Define the cumulative probability function of the demand D to be Q ∑ F (Q) = pi = Prob(D ≤ Q) i=0 Then the optimal value of Q satisfies Cu ≤ F (Q). F (Q − 1) < Cu + Co Step 1: We first construct the expected cost for a given order size Q. Clearly we have the following TWO cases. (i) If the demand D < Q, then the cost will be (Q − D)Co and (ii) if the demand D > Q, then the cost will be (D − Q)Cu. Therefore the expected cost is given by Q ∞ ∑ ∑ (i − Q)pi . E(Q) = Co (Q − i)pi + Cu | i=0 {z } Expected Overage Cost 59 | i=Q+1 {z } Expected Shortage Cost We note that F (Q) is an increasing function in Q, i.e. F (Q + 1) ≥ F (Q) for Q = 0, 1, 2, · · · and F (Q) → 1 as Q → ∞. We are going to show that the optimal Q is the one which satisfies Cu F (Q − 1) < ≤ F (Q). Cu + Co Step 2: From the definition of the expected cost E(Q), we are going to show E(Q) − E(Q + 1) = Cu − (Co + Cu)F (Q) (1.6) and E(Q) − E(Q − 1) = −Cu + (Co + Cu)F (Q − 1). To prove (1.6), we recall that the expected cost E(Q) = Co Q ∑ (Q − i)pi + Cu i=0 ∞ ∑ i=Q+1 60 (i − Q)pi. (1.7) Now we have E(Q + 1) = Co Q+1 ∑ = [Co = [Co ∞ ∑ (Q + 1 − i)pi + Cu i=0 Q+1 ∑ (i − Q − 1)pi i=Q+2 (Q − i)pi + Co i=0 Q ∑ Q+1 ∑ pi] + [Cu i=0 (Q − i)pi − CopQ+1 + Co i=0 ∞ ∑ +[Cu = Co i=0 ∞ ∑ pi] i=Q+2 pi] i=0 (i − Q)pi − CupQ+1 − Cu (Q − i)pi + Cu (i − Q)pi − Cu i=Q+2 Q+1 ∑ i=Q+1 Q ∑ ∞ ∑ ∞ ∑ pi] i=Q+2 ∞ ∑ i=Q+1 (i − Q)pi + Co = E(Q) + CoF (Q) − Cu(1 − F (Q)) = E(Q) − Cu + (Co + Cu)F (Q). 61 Q ∑ i=0 pi − Cu ∞ ∑ i=Q+1 pi Therefore we have E(Q) − E(Q + 1) = Cu − (Co + Cu)F (Q) and E(Q + 1) − E(Q) = −Cu + (Co + Cu)F (Q). Hence we get (1.7) E(Q) − E(Q − 1) = −Cu + (Co + Cu)F (Q − 1). We note that E(Q) ≤ E(Q + 1) ⇔ E(Q) − E(Q + 1) ≤ 0 ⇔ Cu − (Co + Cu)F (Q) ≤ 0 u ⇔ CoC+C ≤ F (Q) u (1.8) E(Q) < E(Q − 1) ⇔ E(Q) − E(Q − 1) < 0 ⇔ −Cu + (Co + Cu)F (Q − 1) < 0 u ⇔ CoC+C > F (Q − 1). u (1.9) and 62 Step 3: We are going to analyze the expected cost. Since F (Q) is increasing in Q, there exists Q∗ such that Cu ≤ F (Q) for Q ≥ Q∗. Co + Cu (1.10) We assume Q∗ to be the least integer such that (1.10) is satisfied. This means that Cu > F (Q∗ − 1). Co + Cu Now we have Cu ≤ F (Q∗) ≤ F (Q∗ + 1) ≤ F (Q∗ + 2) ≤ · · · ≤ . Co + Cu From (1.8) we have E(Q∗) ≤ E(Q∗ + 1) ≤ E(Q∗ + 2) ≤ · · · ≤ . This implies that the optimal Q should be less than or equal to Q∗. 63 We also note that Cu > F (Q∗ − 1) ≥ F (Q∗ − 2) ≥ · · · ≥ F (0). Co + Cu From (1.9) we have E(Q∗) < E(Q∗ − 1) < E(Q∗ − 2) < · · · < E(0). This means that the optimal Q should be greater than or equal to Q∗. We conclude that the optimal value of Q satisfies Cu F (Q − 1) < ≤ F (Q). Cu + Co Remark 10. In particular if Cu ≤ F (0) Co + Cu then the optimal ordering quantity Q will be 0. 64 Example 3. Suppose the daily demand follows a Geometric distribution: pi = (1 − p)pi, i = 0, 1, 2, · · · where 0 < p < 1 and the overage cost and the underage cost are equal. What is the optimal ordering quantity? We have F (Q) = Q ∑ pi = i=0 Q ∑ (1 − p)pi = 1 − pQ+1. i=0 Thus the optimal Q should satisfy 1 F (Q − 1) = 1 − p < ≤ 1 − pQ+1 = F (Q) 2 Q or equivalently pQ+1 ≤ Therefore 1 < pQ . 2 log(0.5) Q = integral part of . log(p) ∗ 65 1.6.5 Newsboy Model with Continuous Stochastic Demand What will be the case when the demand follows a continuous PDF? We expect to get a similar result as in the previous section. We replace the discrete probability function pi by a continuous probability density function f (x). Here for simplicity we assume f (x) > 0 for x ≥ 0 and f (x) = 0 for x < 0. Proposition 15. Define the cumulative probability function of the demand D to be ∫ Q F (Q) = f (t)dt 0 which is strictly increasing in Q. Then the optimal value of Q satisfies Cu F (Q) = . Cu + Co 66 Step 1: The expected cost when the ordering quantity is Q: ∫ Q ∫ ∞ E(Q) = Co (Q − x)f (x)dx + Cu (x − Q)f (x)dx . | 0 {z } | Q {z } Expected Overage Cost Expected Shortage Cost We are going to show that E(Q) is minimized when Q is chosen to be the root of the equation Cu F (Q) = . Co + Cu Step 2: To find the critical point of the expected cost. Rewrite ∫ Q ∫ Q E(Q) = CoQ f (x)dx − Co xf (x)dx 0 ∫ ∞ ∫0 ∞ +Cu xf (x)dx − CuQ f (x)dx. Q Q 67 We note that } ∫ Q { ∫ Q d f (x)dx = f (x)dx + Qf (Q). Q dQ 0 0 {∫ Q } d xf (x)dx = Qf (Q). dQ 0 {∫ ∞ } d xf (x)dx = −Qf (Q). dQ { ∫ ∞ Q } ∫ ∞ d Q f (x)dx = f (x)dx − Qf (Q). dQ Q Q and we have E ′(Q) = Co ∫ Q f (x)dx + CoQf (Q) − CoQf (Q) 0 ∫ ∞ −CuQf (Q) − Cu f (x)dx + CuQf (Q). Q Finally we get E ′(Q) = Co ∫ ∫ Q f (x)dx − Cu 0 = (Co + Cu)F (Q) − Cu. 68 ∞ f (x)dx Q The critical point satisfies the equation Cu F (Q) = . Co + Cu Remark 11. Since F (Q) is strictly increasing, and F (0) = 0 and F (Q) → 1 as Q → ∞, the following equation has an unique root Cu . F (Q) = Co + Cu Step 3: Analyze the critical point. Since E ′′(Q) = (Co + Cu)f (Q) > 0 for Q > 0, the optimal value of Q satisfies the equation (Co + Cu)F (Q) − Cu = 0 or F (Q) = Cu . Co + Cu 69 Example 4. If the demand follows the exponential distribution f (x) = λe−λx then the optimal Q will satisfy ∫ Q F (Q) = λe−λxdx 0 = 1 − e−λQ Cu . = Co + Cu Thus the optimal Q is −1 Co . log λ Co + Cu 70 Remark 12. We give some remarks on the newsboy model. 1. It is clear that the optimal ordering quantity increases (decreases) as the underage cost Cu increases (overage cost Co increases). 2. In the discrete case, it is difficult to determine the optimal Q when the demand is a Poisson process with probability distribution λxe−λ , x = 0, 1, 2, · · · . x! • But when λ is large, the Poisson distribution can be approximated by the normal distribution with mean and variance λ. • So by using the result in continuous case, one can get an approximated optimal ordering quantity very quickly when λ is large. 71 1.7 A Summary of Learning Outcomes • Able to give the definitions of a Poisson Process and the Markov property. • Able to state and prove the relationships among a Poisson process, the exponential distribution and the Poisson distribution. • Able to state and prove the relationships among the Markov property, the exponential distribution and the Erlangian distribution. • Able to formulate Newsboy’s problem and solve for its optimal policy. 72 1.8 Exercises 1. Consider the geometric distribution P (x = i) = p0(1 − p0)i, 0 < p0 < 1 i = 0, 1, . . . , . Prove that for any two non-negative integers t and h, we have P (x ≥ t + h|x ≥ t) = P (x ≥ h). 2. Consider an non-negative discrete random variable x taking values in {0, 1, 2, . . .} such that P (x = i) = pi, i = 0, 1, 2, . . . and 0 < p0 < 1. Suppose that for any two non-negative integers t and h, we have P (x ≥ t + h|x ≥ t) = P (x ≥ h) prove that pi = p0(1 − p0)i, i = 0, 1, . . . , . 3. Let p(h) = e−λh − 1 + λh, prove that p(h) =0 h→0 h lim and hence deduce that e−λh = 1 − λh + o(h). 73 4. Events occur according to a Poisson process with rate λ = 2 per hour. (a) What is the probability that no event occurs between 8 p.m. and 9 p.m.? (b) Starting at noon, what is the expected time at which the fourth event occurs? (c) Find the probability that two or more events occur between 6 p.m. and 8 p.m.. 5. Customers arrive at a bank at a Poisson rate λ. Suppose two customers arrived during the first hour. What is the probability that (a) both arrived during the first 20 minutes? (b) at least one arrived during the first 20 minutes? 74 1.9 Suggested Solutions 1. We note that P (x ≥ h + t) = ∞ ∑ p0(1 − p0)i = (1 − p0)h+t, i=t+h P (x ≥ t) = ∞ ∑ p0(1 − p0)i = (1 − p0)t, i=t and P (x ≥ h) = ∞ ∑ p0(1 − p0)i = (1 − p0)h. i=t Hence P (x ≥ t + h) (1 − p0)h+t h = = (1 − p ) = P (x ≥ h). 0 t P (x ≥ t) (1 − p0) The result follows. 75 2. Now from the given condition we have ∞ ∑ pi i=t+h ∞ ∑ = pi ∞ ∑ pi. i=h i=t Put t = 1 we have ∞ ∑ pi = (1 − p0) i=h+1 ∞ ∑ pi i=h and therefore ph = p0 ∞ ∑ pi. i=h Now put h = 1 we have p1 = p0(1 − p0). Put h = 2 we have p2 = p0(1 − p0 − p1) = p0(1 − p0 − p0(1 − p0)) = p0(1 − p0)2. Inductively (or by using Mathematical Induction), the result follows. 76 3. By using L’hospital rule we have e−λh − 1 + λh −λe−λh + λ lim = lim = 0. h→0 h→0 h 1 Therefore by definition we have p(h) = o(h). Hence o(h) = e−λh − 1 + λh and the result follows. 4. Let X(t) be the number of occurrences in t hours. (a) P (X(1) = 0) = e−2 (b) Let Tn denote the elapsed time between the (n − 1)th and nth event. 1 E(T1 + T2 + T3 + T4) = 4( ) = 2 λ Therefore, the expected time at which the fourth event occurs is 2 P.M. (c) P(Two or more events occur between 6 P.M. and 8 P.M.) = 1 − P (X(2) = 0) − P (X(2) = 1) = 1 − e−4 − 4e−4 = 1 − 5e−5 77 2 Discrete time Markov Chains : Theory and Examples Andrei Markov2 (1856–1922) enrolled at the St. Petersburg University in 1874. He earned a master degree in l880 and a doctorate 4 years later. He became a professor at St. Petersburg in 1886, and a member of the Russian Academy of Sciences in l896. • He retired in 1905, but continued to teach probability courses at the university almost to his death. Apart from Mathematics, Markov was also a very good chess player in St. Petersburg. • His early work was devoted to Number Theory, Continued Fractions, Limits of Integrals, Approximation Theory and Convergence of Series. 2 A Historical Note 78 • After 1900 he turned his attention to probability theory, and this part of his work has the greatest effect on the development of science. He applied the method of continued fractions, pioneered by his teacher Prof P. Chebyshev to probability theory. • In 1887, Chebyshev outlined a proof of a generalized central limit theorem. Markov then studied the sequences of mutually dependent variables, hoping to establish the limiting laws of probability under fairly general assumptions. Eight years later, Markov succeeded in proving the general result rigorously. • While working on this problem, he also extended both the law of large numbers, and the central limit theorem to certain sequences of dependent random variables forming special classes of what are now called Markov chains. Markov chains appeared for the first time in the paper, 79 (i) “The Extension of the Law of Large Numbers on Mutually Dependent Variables” in 1906, and (ii) “Investigation of a Remarkable Case of Dependent Trials” in 1907. • In other articles published in 1911-1912, he studied various generalizations of Markov chains. The foundation of the general theory of Markov processes was laid down in the l930’s by A. Kolmogorov. • Markov chains also have an extensive prehistory, including problems of random walks. But Markov himself never wrote about the applications of his theory to sciences. • He arrived at his chains starting from the internal needs of probability theory. For him the only real example of the chains were cards shuffling and linguistic problems. 80 • Markov chains soon found many applications in modern physics. One of the earliest applications was to describe the Brownian motion. Later cosmic radiation and radioactivity were also studied. • Another frequent application is to study of fluctuation in stock prices. Phenomena generally referred to as random walks, has been developed and widely applied in biological, physical and social sciences, engineering ... . • By his work Markov made important contributions to the development of probability theory, and launched the theory of stochastic process. Markov is also remembered as a mathematician who enjoyed doing numerical computations. 81 • He presented his attitude in an indirect way like this . . . many mathematicians apparently believe that going beyond the field of abstract reasoning into the sphere of effective calculations would be humiliating. 82 A Marketing Problem:3 In a town there are two supermarkets only. They are Wellcome and Park’n. A marketing research indicated that a consumer of Wellcome will switch to Park’n in his/her next shopping with a probability α(> 0). While a consumer of Park’n will switch to Wellcome in his/her next shopping with a probability β(> 0). Question 1: What is the probability that a Wellcome’s consumer will still be a Wellcome’s consumer on his/her nth shopping? Question 2: What will be the market share of the two supermarkets in the town in the long-run? 3 Some Examples for Motivation 83 A Genetic Problem: In a large population of individuals each of whom possess a particular pair of genes, of which gene is classified as type ‘A’ or type ‘a’. Assume that the proportions of individuals whose gene pairs are ‘AA’, ‘aa’, or ‘Aa’ (‘Aa′ = ‘aA′) are respectively given by p0, q0 and r0. Here p0 + q0 + r0 = 1. When two individuals mate, each contributed one of his/her genes, chosen at random, to the resultant offspring. Assuming that the mating occurs at random also, in that each individual is equal likely to mate with any other individual. Question: What is the proportion of individuals in the next generation whose gene pair are AA, aa and Aa? 84 A Problem in the Web: In surfing the Internet, surfers usually use search engines to find the related webpages satisfying their queries. Unfortunately, very often there can be thousands of webpages which are relevant to the queries. Question: How to obtain a proper list of the webpages in certain order of importance? Google actually developed a method called PageRank algorithm for solving this problem. 85 2.1 Introduction to Markov Chain We consider a stochastic process {X (n), n = 0, 1, 2, . . .} that takes on a finite or countable number of set. Example 5. Let X (n) be the weather of the nth day which can be M = {sunny, windy, rainy, cloudy}. One may have the following realization: X (0) =sunny, X (1) =windy, X (2) = rainy, X (3) =sunny, X (4) =cloudy, . . .. Example 6. Let X (n) be the product sales on the nth day which can be M = {0, 1, 2, . . . , }. One may have the following realization: X (0) = 4, X (1) = 5, X (2) = 2, X (3) = 0, X (4) = 5, . . .. Remark 13. For simplicity of discussion we assume M , the state space be {0, 1, 2, . . .}. The element in M is called a state of the process. 86 Definition 6. Suppose there is a fixed probability Pij such that P (X (n+1) = j|X (n) = i, X (n−1) = in−1, . . . , X (0) = i0) = Pij n ≥ 0 where i, j, i0, i1, . . . , in−1 ∈ M . Then this is called a Markov chain process. Remark 14. One can interprete the above probability as follows: the conditional distribution of any future state X (n+1) given the past states X (0), X (2), . . . , X (n−1) and present state X (n), is independent of the past states and depends on the present state only. Remark 15. The probability Pij represents the probability that the process will make a transition to State j given that currently the process is State i. Clearly one has ∞ ∑ Pij ≥ 0, Pij = 1 i = 0, 1, . . . . j=0 Definition 7. The matrix containing Pij , the transition probabilities   P00 P01 · · · P =  P10 P11 · · ·  .. .. .. is called the one-step transition probability matrix of the process. 87 Example 7. Consider the marketing problem again. Let X (n) be a 2-state process (taking values of {0, 1}) describing the behavior of a consumer. X (n) = 0 if the consumer shops with Wellcome on the nth day and; X (n) = 1 if the consumer shops with Park’n on the nth day. Since the future state (which supermarket to shop in the next time) depends on the current state only, it is a Markov chain process. It is easy to check that the transition probabilities P00 = 1 − α, P11 = 1 − β, P01 = α, P10 = β. Therefore the one-step transition matrix of this process is given by ( ) 1−α α P = . β 1−β 88 Example 8. (Random Walk) We consider a person who performs a random walk on the real line with the counting numbers M = {. . . , −2, −1, 0, 1, 2, . . .} being the state-space, see Figure. 2.1. Each time the person at State i can move one step forward (+1) or one step backward (-1) with probabilities p (0 < p < 1) and (1 − p) respectively. Therefore we have the transition probabilities Pi,i−1 = 1 − p Pi,i+1 = p, 1−p • | | | 0 1 2 ··· Figure 2.1. The Random Walk. | · · · −2 | −1 p i = 0, ±1, ±2, . . . . 89 Example 9. (Gambler’s Ruin) Consider a gambler who at each play of the game, either wins one dollar with probability p or lose one dollar with probability (1 − p). The game is over if either he loses all his money or he attains a fortune of N dollars. Let the gambler’s fortune be the state of the gambling process then the process is a Markov chain. Moreover, we have transition probabilities Pi,i−1 = 1 − p Pi,i+1 = p, i = 1, 2, . . . , N − 1 and P00 = PN N = 1. State 0 and N are called the absorbing state. The process will stay at 0 or N for ever if one of the states is reached, see Figure 2.2.. 1−p | 0 | 1 p • | | | 2 3 ··· N Figure 2.2. The Gambler’s Ruin Model. - 90 2.1.1 The nth-Step Transition Matrix In the previous section we have defined the one-step transition probability matrix P for a Markov chain process. In this section we are going to investigate the n-step (n) transition probability Pij of a Markov chain process. (n) Definition 8. We define Pij to be the probability that a process in State i will be in State j after n additional transitions. In particular (1) Pij = Pij . Proposition 16. We have P (n) = P n where P (n) is the n-step transition probability matrix and P is the one-step transition matrix. 91 Proof. We will prove the proposition by mathematical induction. Clearly the proposition is true when n = 1. We then assume that the proposition is true for n. We note that P n = P | ×P × {z. . . × P}. n times Then (n+1) Pij = ∑ (n) (1) Pik Pkj = k∈M ∑ Pikn Pkj = [P n+1]ij . k∈M By the principle of M.I. the proposition is true for all nonnegative integer n. Remark 16. It is easy to see that P (m)P (n) = P mP n = P m+n = P (m+n). .. •N @ .. @@ P @ P @ @ P *•Hk HHP @ H HH @@ .. HH @ HH@ P P HH R @ j -• 0 i• •j (1) Nj (n) iN (n) ik (n) i0 In n transitions (1) kj (1) 0j In one transition Figure 2.3. The (n + 1)-step Transition Probability. 92 Example 10. We consider the marketing problem again. In the model, we have ( ) 1−α α P = . β 1−β If α = 0.3 and β = 0.4 then we have ( )4 ( ) 0.7 0.3 0.5749 0.4351 P (4) = P 4 = = . 0.4 0.6 0.5668 0.4332 Recall that a consumer is in State 0 (1) if he/she is a consumer of Wellcome (Park’n). (4) • P00 = 0.5749 which is the probability that a Wellcome’s consumer will shop with (4) Wellcome on his/her fourth shopping and P01 = 0.4351 is the probability that a Wellcome’s consumer will shop with Park’n on his/her fourth shopping. (4) • P10 = 0.5668 is the probability that a consumer of Park’n will shop with Well(4) come on his/her fourth shopping. P11 = 0.4332 is the probability that a consumer of Park’n will shop with Park’n on his/her fourth shopping. 93 Remark 17. Consider a Markov chain process having states in {0, 1, 2, . . .}. • Suppose that we are given at time n = 0 the probability that the process is in State i is ai, i = 0, 1, 2, . . . . • One interesting question is the following. What is the probability that the process will be in State j after n transitions? • The probability that given the process is in State i and it will be in (n) State j after n transitions is Pij = [P n]ij where Pij is the one-step transition probability from State i to State j of the process. • Therefore the required probability is ∞ ∞ ∑ ∑ (n) P (X (0) = i) × Pij = ai × [P n]ij . i=0 i=0 94 (n) (n) Let X(n) = (X̃0 , X̃1 , . . . , ) be the probability distribution of the states in a (n) Markov chain process at the nth transition. Here X̃i is the probability that the ∞ ∑ (n) process is in State i after n transitions and X̃i = 1. i=0 It is easy to check that X (n+1) (n) = X P and X(n+1) = X(0)P (n+1). Example 11. Refer to the previous example. If at n = 0 a consumer belongs to Park’n. We may represent this information as (0) (0) X(0) = (X̃0 , X̃1 ) = (0, 1). What happen on his/her fourth shopping? )4 ( 0.7 0.3 (4) (0) (4) = (0.5668, 0.4332). X = X P = (0, 1) 0.4 0.6 This means that with a probability 0.4332 he/she is still a consumer of Park’n and a probability 0.5668 he/she is a consumer of Wellcome on his/her fourth shopping. 95 2.2 Irreducible Markov Chain and Classifications of States Definition 9. In a Markov chain, State j is said to be accessible from State i if (n) Pij > 0 for some n ≥ 0. This means that starting from State i, it is possible to enter State j in finite transitions and we write i → j. Definition 10. State i and State j are said to communicate if State i and State j are accessible from each other and we write i ↔ j. Remark 18. We note that “ ↔” is an equivalent relation. (i) State i communicates with State i in 0 step because (0) Pii = P (X (0) = i|X (0) = i) = 1 > 0. (ii)If State i communicates with State j, then State j communicates with State i. (iii)If State i communicates with State j and State j communicates with State k (m) (n) then State i communicates with State k. Because Pij , Pjk > 0 for some m and n. We have ∑ (m) (n) (m+n) (m) (n) Pik = Pih Phk ≥ Pij Pjk > 0. h∈M Therefore State k is accessible from State i. By inter-changing the roles of i and k State i is accessible from State k. Hence i communicates with k. 96 Definition 11. Two states that communicates are said to be in the same class. A Markov chain is said to be irreducible, if all states communicates with each other. Example 12. Consider the transition probability matrix   0 0.0 0.5 0.5 1  0.5 0.0 0.5  2 0.5 0.5 0.0 We note that 0 → 1, 0 → 2, 1 → 0, 1 → 2, 2 → 0, 2 → 1. Therefore 0 ↔ 1 and 1 ↔ 2 and all states belong to one class. Hence this is an irreducible chain. Example 13. Consider another transition probability matrix   0.0 1.0 0.0 0.0 0  1   0.0 0.0 0.5 0.5  . 2  0.0 0.5 0.0 0.5  3 0.0 0.5 0.5 0.0 We note that from States 1, 2 or 3, it is not possible to visit State 0, i.e (n) (n) (n) P10 = P20 = P30 = 0. Therefore the chain is not irreducible (or it is reducible). 97 Definition 12. For any State i in a Markov chain, let fi be the probability that starting in State i, the process will ever re-enter State i. State i is said to be recurrent if fi = 1 and transient if fi < 1. Remark 19. If State i is recurrent then the process will return to State i (with probability 1) again and again, infinitely many times. If State i is transient then the probability that the process will be in State i for n time periods will be fin−1(1 − fi), the Geometric distribution and this probability goes to 0 as n → ∞. Moreover the expected number of time periods that the process will visit i is ∞ ∑ 1 n−1 nfi (1 − fi) = < ∞. 1 − fi n=1 98 Proposition 17. State i is recurrent if ∞ ∑ (n) Pii = ∞ n=1 and State i transient if ∞ ∑ (n) Pii < ∞. n=1 Proof. Let X (n) be the state of the process after n transitions and define { 1, if X (n) = i In = 0, if X (n) ̸= i. ( We note that E ∞ ∑ ) In|X (0) = i n=0 is the expected number of times the process will visit State i given that initially the process is in State i. It is infinite if State i is recurrent and finite if State i is transient. 99 We have E = = = = (∞ ∑ ) In|X (0) = i n=0 ∞ ∑ E(In|X (0) = i) n=0 ∞ ( ∑ n=0 ∞ ∑ n=0 ∞ ∑ 1 · P (X (n) = i|X (0) = i) + 0 · (1 − P (X (n) = i|X (0) ) = i)) P (X (n) = i|X (0) = i) (n) Pii . n=0 Hence the result follows. Remark 20. The proposition implies that a transient state will only be visited finite number of times. Hence in a Markov chain of finite states, we cannot have all states being transient. 100 Proposition 18. If State i is recurrent and State i communicates with State j then State j is also recurrent. Proof. Since i communicates with j, there exist integers k and m such that (k) (m) Pij > 0 and Pji > 0. Now for any integer n we have (m+n+k) Pjj Hence ∞ ∑ n=1 (n) Pjj ≥ ∞ ∑ n=1 (m+n+k) Pjj ≥ (m) (n) (k) ≥ Pji · Pii · Pij . ∞ ∑ (m) (n) (k) Pji Pii Pij n=1 = (m) (k) Pji Pij ∞ ∑ (n) Pii = ∞. n=1 By Proposition 17 State j is recurrent. Remark 21. If State i is transient and State i communicates with State j then State j is also transient. Because if j is recurrent and i communicates with j then by Proposition 18, State i must be recurrent too and this is a contradiction. 101 2.2.1 An Analysis of the Random Walk • Recall that a person performs a random walk on the real line of integers. • Each time the person at state i can move one step forward (+1) or one step backward (-1) with probabilities p (0 < p < 1) and (1−p) respectively. • Therefore we have the transition probabilities Pi,i+1 = p, Pi,i−1 = 1 − p i = 0, ±1, ±2, . . . . • Since all the states are communicated, by Proposition 18, all states are either recurrent or they are all transient. 102 W.L.O.G. let us consider State 0. To classify this state we consider ∞ ∑ (m) P00 . m=1 (2n+1) We note that P00 = 0 and we have (Why?) ( ) 2n (2n) P00 = pn(1 − p)n. n Hence we have ) ∞ ∞ ∞ ( ∞ ∑ ∑ ∑ ∑ (2n)! n 2n (m) (2n) n n I= P00 = P00 = p (1−p) = p (1−p)n. n n!n! m=1 n=1 n=1 n=1 Recall that if I is finite then 0 is transient (hence all the other states) otherwise it is recurrence (hence all the other states). 103 • We have to apply the Stirling’s formula (we will give a proof later) to get a conclusive result. • The Stirling’s formula states that if n is large then 1 −n√ n+ n! ≈ n 2 e 2π. • Hence we have (4p(1 − p))n (2n) √ . P00 ≈ πn If p = 12 then we have 1 (2n) P00 ≈ √ . πn If p ̸= 12 then we have an (2n) P00 ≈ √ πn where 0 < a = 4p(1 − p) < 1. 104 • Therefore when p = 12 , we have ∞ ∞ ∞ ∑ ∑ ∑ 1 1 (2n) √ √ I= > =∞ P00 ≈ πn πn n=1 n=1 n=1 and State 0 is recurrent. • When p ̸= 12 , we have ∞ n ∞ ∞ n ∑ ∑ ∑ a a a (2n) √ √ =√ ≤ I= P00 ≈ πn π π(1 − a) n=1 n=1 n=1 and State 0 is transient. 105 2.2.2 A Proof for the Stirling Formula • Let y1, y2, . . . , yn be n independent Poisson random variables having same mean 1. Then zn = y1 + y2 + . . . + yn, the sum of the n Poisson random variables is also a Poisson random variable with mean n and variance n. • We have P (zn = n) = P (n − 1 < zn ≤ n) −1 zn − n = P (√ < √ ≤ 0) n ∫ 0 n 1 −x2 √ e 2 dx ≈ −1 2π √ n 1 ≈ √ . 2πn 106 Because for large n we have zn − n √ ∼ N (0, 1) n and −x2 e 2 −1 , 0). ≈ 1 for x ∈ ( √ n Now since zn is a Poisson random variable e−nnn P (zn = n) = n! therefore we have 1 −n√ n+ n! ≈ n 2 e 2π. 107 2.3 2.3.1 Simulation and Construction of Markov Chains Simulation of Markov Chains with EXCEL Consider a Markov chain process with THREE states {0, 1, 2} with the transition probability matrix   0 0.2 0.3 0.5 P = 1  0.5 0.1 0.4  . 2 0.3 0.3 0.4 Given that X0 = 0, our objective here is to generate a sequence {X (n), n = 1, 2, . . .} which follows a Markov chain process with the transition matrix P . To generate {X (n)} there are three possible cases: (i) Suppose X (n) = 0, then we have P (X (n+1) = 0) = 0.2 P (X (n+1) = 1) = 0.3 (ii) Suppose X (n) = 1, then we have P (X (n+1) = 0) = 0.5 P (X (n+1) = 1) = 0.1 (iii) Suppose X (n) = 2, then we have P (X (n+1) = 0) = 0.3 P (X (n+1) = 1) = 0.3 108 P (X (n+1) = 2) = 0.5; P (X (n+1) = 2) = 0.4; P (X (n+1) = 2) = 0.4. Suppose we can generate a random variable U uniformly distributed over [0, 1]. Then we generate the distribution in Case   0 if X (n+1) = 1 if  2 if (i) when X (n) = 0 easily as follows: U ∈ [0, 0.2), U ∈ [0.2, 0.5), U ∈ [0.5, 1]. The distribution in Case (ii) when X (n) = 1 can be generated as follows:   0 if U ∈ [0, 0.5), X (n+1) = 1 if U ∈ [0.5, 0.6),  2 if U ∈ [0.6, 1]. The distribution in Case (iii) when X (n) = 2 can be generated as follows:   0 if U ∈ [0, 0.3), X (n+1) = 1 if U ∈ [0.3, 0.6),  2 if U ∈ [0.6, 1]. 109 • In EXCEL one can generate U , a random variable uniformly distributed over [0, 1] by using “=rand()”. • By using simple logic statement in EXCEL, one can simulate a Markov chain easily. The followings are some useful logic statements in EXCEL. (i) “B1” means column B and Row 1. (ii) “=IF(B1=0,1,-1)” gives 1 if B1=0 otherwise it gives -1. (iii) “=IF(A1 > B2,0,1)” gives 0 if A1 > B2 otherwise it gives 1. (iv) “=IF(AND(A1=1,B2>2),1,0)” gives 1 if A1=1 and B2>2 otherwise it gives 0. (v) “=max(1,2,-1) =2 ” gives the maximum of the numbers. • A demonstration EXCEL file is available at “http : //hkumath.hku.hk/ ∼ wkc/sim.xls′′ for your reference. The program generates a Markov chain process X (1), X (2), . . . , X (30). whose transition probability is P and X (0) = 0. 110 2.3.2 Construction a Markov Chain Model • Given the observed data sequence {X (n)}, one can count the transition frequency Fjk in the sequence from State j to State k in one step. Hence one can construct the one-step transition matrix for the sequence {X (n)} as follows:   F11 · · · · · · F1m  F21 · · · · · · F2m  . F = (2.1) . . . .  . . . .  Fm1 · · · · · · Fmm • From F , one can get the estimates for Pkj as follows:   P11 · · · · · · P1m  P21 · · · · · · P2m  P = .. .. ..   ..  Pm1 · · · · · · Pmm where Fkj Pkj = m ∑ Fkj if m ∑ Fkj > 0 and Pkj = 0 if j=1 (2.2) m ∑ j=1 j=1 111 Fkj = 0. • We consider a sequence {X (n)} of three states (m = 3) given by {0, 0, 1, 1, 0, 2, 1, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 0, 1}. We have the transition frequency matrices   1 6 1 F =  3 1 3 . 3 1 0 Therefore one-step transition matrices can be estimated as follows:   1/8 3/4 1/8 P =  3/7 1/7 3/7  . 3/4 1/4 0 A demonstration EXCEL file is available at “http : //hkumath.hku.hk/ ∼ wkc/build.xls′′ for your reference. 112 (2.3) (2.4) (2.5) 2.4 Stationary Distribution of a Finite Markov Chain (n) Definition 13. A State i is said to have period d if Pii = 0 whenever n is not divisible by d, and d is the largest integer with this property. A state with period 1 is said to be aperiodic. Example 14. Consider the transition probability matrix ( ) 0 1 P = . 1 0 We note that ( P (n) = (2n+1) We note that P00 0 1 1 0 )n (2n+1) = P11 = 1 2 ( n n+1 1 + (−1) 1 + (−1) 1 + (−1)n+1 1 + (−1)n ) . = 0, so both States 0 and 1 have a period of 2. Definition 14. State i is said to be positive recurrent if it is recurrent and starting in State i the expected time until the process returns to State i is finite. Definition 15. A state is said to be egordic if it is positive recurrent and aperiodic. 113 Proposition 19. In a finite irreducible aperiodic Markov chain having N states and transition probability matrix P , there exists M < ∞ such that (m) Pii > 0 for m ≥ M . Proposition 20. In a finite irreducible aperiodic Markov chain having N states and transition probability matrix P , there exists M < ∞ such that (m) Pij > 0 for any two States i, j and m ≥ M . (k) Proof. From Proposition 19 there is an K such that Pii > 0 for k ≥ K and all states i. For any two states i and j, as the chain is irreducible there exists kij such that (k ) Pij ij > 0. Therefore (m) (m−kij ) Pij ≥ Pii if m ≥ Mij = kij + K ( m − kij ≥ K ). (k ) Pij ij > 0 The proof is complete if we let M = max {M00, M01, . . . , M10, M11, . . .} . We remark that here M is a finite set. 114 We recall the example of the marketing problem with X(0) = (1, 0). We observe that ( ) 0.7 0.3 (1) (0) X = X P = (1, 0) = (0.7, 0.3), 0.4 0.6 ( ) 0.61 0.39 X(2) = X(0)P 2 = (1, 0) = (0.61, 0.39), 0.52 0.48 ( ) 0.5749 0.4251 X(4) = X(0)P 4 = (1, 0) = (0.5749, 0.4251), 0.5668 0.4332 ( ) 0.5715 0.4285 (8) (0) 8 X = X P = (1, 0) = (0.5715, 0.4285), 0.5714 0.4286 ( ) 0.5714 0.4286 X(16) = X(0)P 16 = (1, 0) = (0.5714, 0.4286). 0.5714 0.4286 It seems that lim X(n) = (0.57, 0.42). n→∞ In fact it exists and independent of X(0)! It means that in the long run, the probability that a consumer belongs to Wellcome (Park’n) is given by 0.57 (0.42). 115 • We note that X(n) = X(n−1)P therefore if lim X(n) = π n→∞ then π = lim X(n) = lim X(n−1)P = πP. n→∞ n→∞ We have the following definition Definition 16. A row vector π = (π0, π1, . . . , πk−1) is said to be a stationary distribution of a finite Markov chain if it satisfies: (i) k−1 ∑ π i ≥ 0 and π i = 1. i=0 (ii) πP = π, i.e. k−1 ∑ i=0 116 π iPij = π j . Proposition 21. For any irreducible and aperiodic Markov chain having k states, there exists at least one stationary distribution. Proposition 22. For any irreducible and aperiodic Markov chain having k states, for any initial distribution X(0) lim ||X(n) − π|| = lim ||X(0)P n − π|| = 0. n→∞ n→∞ where π is a stationary distribution for the transition matrix P . Proposition 23. The stationary distribution π in Proposition 22 is unique. Proof. Suppose there are two stationary distributions π and π̂. Take X(0) = π then X(n) = π for all n (Recall that π = πP ). From Proposition 22 we have 0 = lim ||X(n) − π̂|| = lim ||π − π̂|| = ||π − π̂|| n→∞ n→∞ where ||V|| = n ∑ i=1 Therefore we must have π = π̂. 117 |Vi|. Remark 22. The requirement of aperiodic is important. The following transition probability matrix is irreducible but NOT aperiodic: ( ) 0 1 P = . 1 0 The stationary distribution can be shown to be π = (0.5, 0.5). One can check (0.5, 0.5)P = (0.5, 0.05). However, with x1 = (1, 0), the sequence xn+1 = xnP does NOT converges to π. This means a stationary distribution may NOT be a steady-state distribution. 118 2.4.1 Applications of the Stationary Distribution Example 15. Recall the marketing problem. The transition matrix is ( ) 1−α α P = . β 1−β To solve for the stationary distribution (π 0, π 1), we consider  = π0  (1 − α)π 0 + βπ 1 απ 0 + (1 − β)π 1 = π 1  π0 + π1 = 1. Solving the linear system of equations we have { π 0 = β(α + β)−1 π 1 = α(α + β)−1. Therefore in the long run, the market shares of Wellcome and Park’n are respectively β α and . (α + β) (α + β) 119 Example 16. Recall the genetic problem. If a parent is randomly chosen from the population and then randomly choose one of its genes. This is equivalent to choose a gene from the population randomly (At t = 0, P (AA) = p0, P (aa) = q0, P (aA) = r0). By conditioning on the gene pair of the parent the probability that a randomly chosen will be of type A and a are given by P (A) = = = and P (a) = = = P (A|AA)p0 + P (A|aa)q0 + P (A|Aa)r0 1 · p0 + 0 · q0 + 21 r0 p0 + r20 P (a|AA)p0 + P (a|aa)q0 + P (a|Aa)r0 0 · p0 + 1 · q0 + 12 r0 q0 + r20 . 120 Therefore under random mating a randomly chosen member of the next generation (t = 1) will be type AA, the probability is r0 2 p1 = P (A)P (A) = (p0 + ) ; 2 type aa, the probability is r0 2 q1 = P (a)P (a) = (q0 + ) ; 2 type Aa, the probability is r0 r0 r1 = P (A)P (a) + P (a)P (A) = 2P (a)P (A) = 2(p0 + )(q0 + ). 2 2 121 We repeat the argument above, then in the second generation we have P (A) = P (A|AA)p1 + P (A|aa)q1 + P (A|Aa)r1 = 1 · p1 + 0 · q1 + 12 r1 r1 = p1 + 2r r0 r0 0 2 = (p0 + ) + (p0 + )(q0 + ) r20 r0 2 r0 2 r0 = (p0 + )(p0 + + q0 + ) = p0 + 2 2 2 2 and P (a) = P (a|AA)p1 + P (a|aa)q1 + P (a|Aa)r1 = 0 · p1 + 1 · q1 + 12 r1 r1 = q1 + 2r r0 r0 0 2 = (q0 + ) + (p0 + )(q0 + ) r0 2 r0 2 r0 r20 = (q0 + )(q0 + + p0 + ) = q0 + . 2 2 2 2 This is called the Hardy-Weinberg’s law. 122 • Now we are interested in the long run distribution of AA, aa and Aa in the population. This can be analyzed by using Markov chain process. Consider a single individual and his/her descendants. We assume that each individual has exactly one offspring and let X (n) be the state of the gene pairs in his/her nth generation. We assume that the population is in steady state, i.e. pn = p, qn = q and rn = r. The probability transition matrix of the process is given by   r r AA p+2 0 q+2 P = aa  0 q + 2r p + 2r  . p r q r 1 + + Aa 2 4 2 4 2 Take for example, we must have Paa,AA = PAA,aa = 0 and 1 r PAA,AA = P (A|AA)p + P (A|aa)q + P (A|Aa)r = 1 · p + 0 · q + r = p + . 2 2 PaA,aA = 12 (P (A|AA)p + P (A|aa)q + P (A|Aa)r) + 12 (P (a|AA)p + P (a|aa)q + P (a|Aa)r) 1 = 21 (1 · p + 0 · q + 12 r) + 21 (0 · p + 1 · q + 12 r) = p+q+r = 2 2. 123 By direction verification, it can be shown below that (p, q, r)P = (p, q, r) and p + q + r = 1 The first equation: r p r r p(p + ) + q · 0 + r( + ) = (p + )2 = p. 2 2 4 2 The second equation: r q r r p · 0 + q(q + ) + r( + ) = (q + )2 = q. 2 2 4 2 The third equation: r r r pr qr r(r + p + q) p(q + ) + q(p + ) + = 2pq + + + 2 2 2 2r 2 r 2 = 2p(q + ) + r(q + ) 2 r2 r = 2(p + )(q + ) 2 2 = r. Hence the stationary distribution of the process is (p, q, r). 124 Example 17. We recall the problem of obtaining a list of webpages with ranking. Google developed an algorithm for ranking the webpages. The PageRank of a webpage is defined as follows: Let N be the total number of webpages in the web and we define a matrix Q called the hyperlink matrix. Here Qij = { 1/k if webpage j is an outgoing link of webpage i; 0 otherwise; and k is the total number of outgoing links of webpage i. For simplicity of discussion, here we assume that Qii > 0 for all i. This means for each webpage, there is a link pointing to itself. Hence Q can be regarded as a transition probability matrix of a Markov chain of a random walk. 125 • One may regard a surfer as a random walker and the webpages as the states of the Markov chain. • Assuming that this underlying Markov chain is irreducible and aperiodic, then the steady-state probability distribution (p1, p2, . . . , pN )T of the states (webpages) exists. • Here pi is the proportion of time that the random walker (surfer) visiting state (webpage) i. • The higher the value of pi is, the more important webpage i will be. Thus the PageRank of webpage i is then defined as pi. 126 • Here we give a numerical demonstration. Let us consider a web of three webpages: 0, 1, 2. Suppose that the links are given as follow: 0 → 1, 0 → 2, 1 → 0 and 2 → 1. The outdegrees of States 0, 1, 2 are 3, 2, 2 respectively. • The transition probability of this  0 1/3 P = 1  1/2 2 0 Markov chain is given by  1/3 1/3 1/2 0  . 1/2 1/2 The steady-state probability distribution p = (p0, p1, p2) satisfies p = pP and p0 + p1 + p2 = 1. 127 • Solving the linear system of equations, we get 3 4 2 (p0, p1, p2) = ( , , ). 9 9 9 The ranking of the webpages: Webpage 1 > Wepbage 0 > Webpage 2. • It is clear that both Webpages 0 and 2 point to Webpage 1 and therefore it must be the most important. Since the most important Webpage 1 points to Webpage 0 only, Webpage 0 is more important than Webpage 2. • We remark that the steady state probability distribution may not exist as the Markov chain may not be irreducible. But one can always consider the following transition probability matrix: α P̃ = (1 − α)P + (1, 1, . . . , 1)T (1, 1, . . . , 1) N for very small positive α. Then P̃ is irreducible (Exercise). 128 2.5 More Markov Chain Models In this section, we introduce the concepts of high-order Markov models and hidden Markov models by some examples. 2.5.1 High-order Markov Models Definition 17. Suppose that P (X (n+1) = j|X (n) = in, X (n−1) = in−1, . . . , X (0) = i0) = P (X (n+1) = j|X (n) = in, X (n−1) = in−1, . . . , X (n−k) = in−k ), i, j, i0, i1, . . . , in−1 ∈ M . Then this is called a (k+1)th order Markov chain process. • This means that the distribution of X (n+1) depends on the states of X (n), . . . X (n−k). 129 n≥1 Example 18. Consider a 2nd-order Markov chain in the state-space M = {0, 1}. The distribution of X (n+1) depends on the states of X (n) and X (n−1). To define this Markov chain process, one has to define four distributions. P (X (n+1) = i|X (n) = 0, X (n−1) = 0), P (X (n+1) = i|X (n) = 0, X (n−1) = 1), P (X (n+1) = i|X (n) = 1, X (n−1) = 0), P (X (n+1) = i|X (n) = 1, X (n−1) = 1), i = 0, 1 and n = 1, 2, 3, . . .. Example 19. In the previous example, the four distributions can be: P (0|(0, 0)) = 0.4; P (0|(0, 1)) = 0.2; P (1|(0, 0)) = 0.6; P (1|(0, 1)) = 0.8; P (0|(1, 0)) = 0.3; P (0|(1, 1)) = 0.1; P (1|(1, 0)) = 0.7; P (1|(1, 1)) = 0.9. If X (3) = 0 and X (4) = 1 then P (X (5) = 0) = 0.3 and P (X (5) = 1) = 0.7. 130 Example 20. Given the observed sequence: 0, 1, 1, 0, 0 , 1, 1, 0, 0 , 1, 1, 0, 1, 1, 0, 1, 0, 0 , 1, 1. How to estimate the four distributions for a 2nd-order Markov model? By counting the “transition frequency” one can obtain the following table. (X (n−1), X (n)) X (n+1) = 0 X (n+1) = 1 P (X (n+1) = 0) P (X (n+1) = 1) (0, 0) 0 3 0/3 3/3 (0, 1) 1 5 1/6 5/6 (1, 0) 2 2 2/4 2/4 (1, 1) 4 0 4/4 0/4 Remark 23. The transition matrix of the Markov chain model for the sequence is ( ) 1/3 2/3 . 1/2 1/2 Which model do you think is better for modeling the sequence? 131 2.5.2 Hidden Markov Model Sometimes some states are not observable and they are hidden. Example 21. Consider a process of choosing a die and obtaining number of dots by throwing the die. Suppose we have two dice A and B, each has four faces (1,2,3 and 4). Die A is fair and Die B is bias. The probability distributions of dots obtained by throwing the dice A and B are given below: Die 1 2 3 4 A 1/4 1/4 1/4 1/4 B 1/6 1/6 1/3 1/3 Each time a die is chosen, with probability α, Die A is chosen and with probability (1 − α), Die B is chosen. This is hidden (no one knows the chosen die is A or B). The value of α is to be determined. The die is thrown and the number of dots (this is observable) obtained is recorded. The following is a possible realization of the process: A → 1 → A → 3 → B → 2 → A → 1→ B → 4 → B → 1. 132 • The process can be modeled by a Markov chain with the transition probability matrix being given by (embedding techniques)   0 0 1/4 1/4 1/4 1/4 A  B   0 0 1/6 1/6 1/3 1/3  1  α 1−α 0 0 0 0   . P =  2 α 1−α 0 0 0 0   3 α 1−α 0 0 0 0  4 α 1−α 0 0 0 0 • How to estimated α, if the following sequence of dots (in steady-state) is observed? 1, 3, 3, 2, 3, 2, 4, 3, 1, 3, 3, 4. We note that A B 1 P2 = 2 3 4  α 1−α α 1−α  0 0  0 0  0 0 0 0 0 0 1 α + 6 12 1 α + 6 12 1 α 6 + 12 1 α + 6 12 133 0 0 1 α + 6 12 1 α + 6 12 1 α 6 + 12 1 α + 6 12 0 0 1 α − 3 12 1 α − 3 12 1 α 3 − 12 1 α − 3 12  0 0   1 α  3 − 12  . 1 α  − 3 12  1 α  3 − 12 1 α − 3 12 If we ignore the hidden states, the observable states have the transition probability matrix given by 1 α 1 α 1 α 1 α  1 6 + 12 6 + 12 3 − 12 3 − 12 1 α 1 α 1 α 1 α  + + − − 2  ′ 6 12 6 12 3 12 3 12  . P =  1 α 1 α 1 α 1 α   3 + + − − 6 12 6 12 3 12 3 12 1 α 1 α 1 α 1 α 4 + + − − 6 12 6 12 3 12 3 12 The stationary distribution of P ′ is given by (Why?) 1 α 1 α 1 α 1 α p = ( + , + , − , − ). 6 12 6 12 3 12 3 12 This should be consistent with the observed occurrence of each state in the sequence: 1 α 1 α 1 α 1 α 2 2 6 2 p = ( + , + , − , − ) ≈ q = ( , , , ). 6 12 6 12 3 12 3 12 12 12 12 12 The unknown parameter α can be obtained by solving { 2 } 4 ∑ 1 α 2 2 (pi − qi) = min + . min ||p − q||2 = min 0≤α≤1 0≤α≤1 36 0≤α≤1 18 i=1 In this case, the best α is 0. 134 2.6 A Summary of Learning Outcomes • Able to give the definition of a Markov chain, transition probability, transition matrix, irreducibility. • Able to classify the states of a Markov chain. • Able to compute and interpret the steady-state probability distribution of a Markov chain. • Able to compute and interpret the mean time in transient states of a Markov chain. • Able to construct and apply a simple HMM and compute the model parameters. 135 2.7 Exercises 1. Let P be the transition probability matrix of a Markov chain with m states. Prove or disprove the followings. (a) Suppose that P n is a positive matrix (i.e. all the entries are positive) then P k is also positive for all k ≥ n. (b) If State j can be reach from State i then it can be reach in m steps or less. 2. Let P be the transition probability matrix of a Markov chain with m states. Suppose that m ∑ k=1 Pkj = m ∑ Pjk = 1 for j = 1, 2, . . . , m. k=1 Find the stationary distribution of the Markov chain. 3. Construct an m × m transition probability P such that π = (π 1, . . . , π m), π i ≥ 0, m ∑ i=1 is the stationary distribution of P . 136 πi = 1 4. Consider a random walk on a polygon of n vertices 1, 2, . . . , n. Each time the random walker may move one step clockwisely with probability p or one step anti-clockwisely with probability 1 − p where 0 ≤ p ≤ 1. By regarding each vertex as a state, this random walk process is a Markov chain with the following transition probabilities P1,2 = P2,3 = . . . = Pn−1,n = Pn,1 = p and P2,1 = P3,2 = . . . = Pn,n−1 = P1,n = 1 − p. (i) Write down the transition probability matrix of the Markov chain. (ii) Find the steady-state probability distribution π of the Markov chain. (iii) Let π k be the state probability distribution of the Markov chain at time k. Prove or disprove the following statement, for 0 ≤ p ≤ 1, we have lim ||π k − π||2 = 0. k→∞ 137 5. Consider a random walk on a hexagon with its centroid as shown in Figure 1. Suppose that at each State i(i = 0, 1, 2, 3, 4, 5, 6), the transition probabilities to other adjacent states are equal and the probability of staying at the same state in the next transition is zero. 5• Q Q Q •4 Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q 6• •0 •3 •2 1• Figure 2.5. The Random Walk. (i) Show that the Markov chain of the random walk is irreducible and all the states are recurrent. (ii) Find the state-state probability distribution π = (π 0, π 1, π 2, π 3, π 4, π 5, π 6) of the Markov chain where 6 ∑ πP = π and π i = 1 and π i ≥ 0 for i = 0, 1, . . . , 6. i=0 138 2.8 Suggested Solutions 1. (a) Yes. One may prove it by induction on n. The statement is true for i = n. Assume that the statement is true for i = n + k then m ∑ (n+k) [P n+k+1]ij = Pil Plj l=1 must be positive. Otherwise Plj = 0 for l = 1, 2, . . . , m. This implies that Pijn = 0 for all n = 1, 2, . . . and i = 1, 2, . . . , m. This is a contradiction. (b) Yes. One can regard this as a graph problem by letting each state as a vertex and there is directed path from State a to State b if Pab > 0. Suppose the statement is false then the length of shortest path from i to j is strictly greater than m. Let us say the path is i → a1 → a2 → . . . → aN → j where N ≥ m. We note that the states ai are distinct and cannot be j or otherwise we can get a shorter path. But this means that we have at least m + 1 distinct states in the system. Hence we have a contradiction and the statement must be true. 139 2. Let 1 π = (1, 1, . . . , 1), m we note that πP = π and m ∑ π i = 1. i=1 Therefore π is the stationary distribution. 3. Define P = (1, 1, . . . , 1)T π then πP = π. 140 4. (i) If we order the states from 1 to n then the transition probability matrix is given by   0 p 0 ··· 0 1−p 1−p 0 p 0 ··· 0     0 1−p 0 p ··· 0    P = . . . . . . . . . . . .   .   0 ··· 0 1−p 0 p  p 0 ··· 0 1−p 0 (ii) We note that (1, 1, . . . , 1)P = (1, 1, . . . , 1), therefore 1 π = (1, 1, . . . , 1). n (iii) For 0 < p < 1, the Markov chain is irreducible and aperiodic, therefore we have lim ||π k − π||2 = 0. k→∞ But for p = 1 or p = 0, beginning with π 0 = (1, 0, . . . , 0), we have √ √ 2 n(n − 1) n − 1 (n − 1) ||π k − π||2 = + = . n2 n2 n independent of k. Thus the statement is false. 141 5. (i) We note that all the States 1, 2, 3, 4, 5, 6 communicate with State 0 and therefore all the states communicate with each other. Hence the Markov chain is irreducible. Now at least one of the states is recurrent, otherwise all the states are transient and will be visited finite many times. This is impossible. If there is a recurrent state, let say i, as all the other states communicate with i, all the other states are also recurrent. (b)(ii) Since the Markov chain is finite and irreducible, there exists an unique stationary distribution π. By symmetry we have π 1 = π 2 = π 3 = π 4 = π 5 = π 6 = p. Hence we have π 0 + 6p = 1. Moreover from States 1, 2, 3, 4, 5, 6 the process can go to State 0 with probability 1/3. Therefore 1 6p π 0 = (π 1 + π 2 + π 3 + π 4 + π 5 + π 6) = = 2p. 3 3 Solving the two equations we have 1 1 1 1 1 1 1 π = ( , , , , , , ). 4 8 8 8 8 8 8 142

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download hku m01