Download Entropy Rate

Chapter 4 Entropy Rates of a Stochastic Process Peng-Hua Wang Graduate Inst. of Comm. Engineering National Taipei University Chapter Outline Chap. 4 Entropy Rates of a Stochastic Process 4.1 Markov Chains 4.2 Entropy Rate 4.3 Example: Entropy Rate of a Random Walk on a Weighted Graph 4.4 Second Law of Thermodynamics 4.5 Functions of Markov Chains Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 4 - p. 2/13 4.1 Markov Chains Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 4 - p. 3/13 Stationary Definition (Stationary) A stochastic process is said to be stationary if Pr{X1 = x1 , X2 = x2 , . . . , Xn = xn } = Pr{X1+ℓ = x1 , X2+ℓ = x2 , . . . , Xn+ℓ = xn } for every n and every shift ℓ. ■ the joint distribution of any subset of the sequence of random variables is invariant with respect to shifts in the time index. Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 4 - p. 4/13 Markov chain Definition (Markov chain) A discrete stochastic process X1 , X2 , . . . is said to be a Markov chain or a Markov process if for n = 1, 2, . . . , Pr{Xn+1 = xn+1 |Xn = xn , Xn−1 = xn−1 , . . . , X1 = x1 } = Pr{Xn+1 = xn+1 |Xn = xn }. ■ The joint pmf can be written as p(x1 , x2 , . . . , xn ) = p(x1 )p(x2 |x1 )p(x3 |x2 ) · · · p(xn |xn−1 ). Definition (Time invariant) The Markov chain is said to be time invariant if the transition probability p(xn+1 |xn ), Pr{Xn+1 = b|Xn = a} = Pr{X2 = b|X1 = a} for all a, b ∈ X . Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 4 - p. 5/13 Markov chain ■ We will assume that the Markov chain is time invariant. ■ Xn is called the state at time n. ■ A time invariant Markov chain is characterized by its initial state and a = [Pij ], i, j ∈ {1, 2, . . . , m}, where = j|Xn = i}. probability transition matrix P Pi,j = Pr{Xn+1 ■ The pmf at time n + 1 is p(xn+1 ) = X p(xn )Pxn xn+1 xn ■ A distribution on the states such that the distribution at time n + 1 is the same as the distribution at time n is called a stationary distribution. Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 4 - p. 6/13 Example 4.1.1 Consider a two-state Markov chain with a probability transition matrix P = " 1−α α β 1−β # . Find its stationary distribution and entropy. Solution. Let µ1 , µ2 be the stationary distribution. µ1 = µ1 (1 − α) + µ2 β µ2 = µ1 α + µ2 (1 − β) and µ1 + µ2 = 1. Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 4 - p. 7/13 4.2 Entropy Rate Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 4 - p. 8/13 Entropy Rate Definition (Entropy Rate) The entropy of a random process {Xi } is defined by 1 H(X ) = lim H(X1 , X2 , . . . , Xn ). n→∞ n Definition (Conditional Entropy Rate) The entropy of a random process {Xi } is defined by H ′ (X ) = lim H(Xn |X1 , X2 , . . . , Xn−1 ). n→∞ Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 4 - p. 9/13 Entropy Rate ■ If X1 , X2 , . . . are i.i.d. random variables. Then ■ H(X1 , X2 , . . . , Xn ) nH(X1 ) H(X ) = lim = lim = H(X1 ). n→∞ n n If X1 , X2 , . . . are independent but not identical distributed n X 1 H(X ) = lim H(Xi ). n→∞ n i=1 ■ We can choose a sequence of distributions on X1 , X2 . . . such that the limit does not exist. Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 4 - p. 10/13 Entropy Rate Theorem 4.2.2 For a stationary stochastic process, H(Xn |Xn−1 , . . . , X1 ) is nonincreasing in n and has a limit H ′ (X ). Proof. H(Xn+1 |X1 , X2 , . . . , Xn ) ≤H(Xn+1 |X2 , . . . , Xn ) (conditioning reduce entropy) =H(Xn |X1 , . . . , Xn−1 ) (stationary) Since H(Xn |Xn−1 , . . . , X1 ) is nonnegative and decreasinging, it has a limit H ′ (X ). Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 4 - p. 11/13 Entropy Rate Theorem 4.2.1 For a stationary stochastic process, both H(X ) and H ′ (X ) exist and are equal. H(X ) = H ′ (X ). Proof. By the chain rule, n 1 1X H(X1 , X2 , . . . , Xn ) = H(Xi |, Xi−1 , . . . , X1 ), n n i=1 that is, the entropy rate is the time average of the conditional entropies. Since the conditional entropies has a limit H ′ (X ). We conclude that the entropy rate has the same limit by Theorem of Cesáro mean. Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 4 - p. 12/13 Cesáro mean Theorem (Cesáro mean) If an bn → a. → a and bn = 1 n Pn i=1 ai , then > 0. Since an → a, there exists a number N such that |an − a| ≤ ǫ for n > N . Hence, n n 1 X 1X (ai − a) ≤ |(ai − a)| |bn − a| = n n i=1 i=1 Proof. Let ǫ N N X X n−N 1 1 |(ai − a)| + ǫ≤ |(ai − a)| + ǫ ≤ ǫ ≤ n i=1 n n i=1 when n is large enough. Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 4 - p. 13/13

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Entropy Rate