Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Chapter 4
Entropy Rates of a Stochastic Process
Peng-Hua Wang
Graduate Inst. of Comm. Engineering
National Taipei University
Chapter Outline
Chap. 4 Entropy Rates of a Stochastic Process
4.1 Markov Chains
4.2 Entropy Rate
4.3 Example: Entropy Rate of a Random Walk on a Weighted Graph
4.4 Second Law of Thermodynamics
4.5 Functions of Markov Chains
Peng-Hua Wang, April 2, 2012
Information Theory, Chap. 4 - p. 2/13
4.1 Markov Chains
Peng-Hua Wang, April 2, 2012
Information Theory, Chap. 4 - p. 3/13
Stationary
Definition (Stationary) A stochastic process is said to be stationary if
Pr{X1 = x1 , X2 = x2 , . . . , Xn = xn }
= Pr{X1+ℓ = x1 , X2+ℓ = x2 , . . . , Xn+ℓ = xn }
for every n and every shift ℓ.
■
the joint distribution of any subset of the sequence of random
variables is invariant with respect to shifts in the time index.
Peng-Hua Wang, April 2, 2012
Information Theory, Chap. 4 - p. 4/13
Markov chain
Definition (Markov chain) A discrete stochastic process X1 , X2 , . . .
is said to be a Markov chain or a Markov process if for n
= 1, 2, . . . ,
Pr{Xn+1 = xn+1 |Xn = xn , Xn−1 = xn−1 , . . . , X1 = x1 }
= Pr{Xn+1 = xn+1 |Xn = xn }.
■
The joint pmf can be written as
p(x1 , x2 , . . . , xn ) = p(x1 )p(x2 |x1 )p(x3 |x2 ) · · · p(xn |xn−1 ).
Definition (Time invariant) The Markov chain is said to be time
invariant if the transition probability p(xn+1 |xn ),
Pr{Xn+1 = b|Xn = a} = Pr{X2 = b|X1 = a} for all a, b ∈ X .
Peng-Hua Wang, April 2, 2012
Information Theory, Chap. 4 - p. 5/13
Markov chain
■
We will assume that the Markov chain is time invariant.
■
Xn is called the state at time n.
■
A time invariant Markov chain is characterized by its initial state and a
= [Pij ], i, j ∈ {1, 2, . . . , m}, where
= j|Xn = i}.
probability transition matrix P
Pi,j = Pr{Xn+1
■
The pmf at time n + 1 is
p(xn+1 ) =
X
p(xn )Pxn xn+1
xn
■
A distribution on the states such that the distribution at time n + 1 is
the same as the distribution at time n is called a stationary
distribution.
Peng-Hua Wang, April 2, 2012
Information Theory, Chap. 4 - p. 6/13
Example 4.1.1
Consider a two-state Markov chain with a probability transition matrix
P =
"
1−α
α
β
1−β
#
.
Find its stationary distribution and entropy.
Solution. Let µ1 , µ2 be the stationary distribution.
µ1 = µ1 (1 − α) + µ2 β
µ2 = µ1 α + µ2 (1 − β)
and
µ1 + µ2 = 1.
Peng-Hua Wang, April 2, 2012
Information Theory, Chap. 4 - p. 7/13
4.2 Entropy Rate
Peng-Hua Wang, April 2, 2012
Information Theory, Chap. 4 - p. 8/13
Entropy Rate
Definition (Entropy Rate) The entropy of a random process {Xi } is
defined by
1
H(X ) = lim H(X1 , X2 , . . . , Xn ).
n→∞ n
Definition (Conditional Entropy Rate) The entropy of a random
process {Xi } is defined by
H ′ (X ) = lim H(Xn |X1 , X2 , . . . , Xn−1 ).
n→∞
Peng-Hua Wang, April 2, 2012
Information Theory, Chap. 4 - p. 9/13
Entropy Rate
■
If X1 , X2 , . . . are i.i.d. random variables. Then
■
H(X1 , X2 , . . . , Xn )
nH(X1 )
H(X ) = lim
= lim
= H(X1 ).
n→∞
n
n
If X1 , X2 , . . . are independent but not identical distributed
n
X
1
H(X ) = lim
H(Xi ).
n→∞ n
i=1
■
We can choose a sequence of distributions on X1 , X2 . . . such that
the limit does not exist.
Peng-Hua Wang, April 2, 2012
Information Theory, Chap. 4 - p. 10/13
Entropy Rate
Theorem 4.2.2 For a stationary stochastic process,
H(Xn |Xn−1 , . . . , X1 ) is nonincreasing in n and has a limit H ′ (X ).
Proof.
H(Xn+1 |X1 , X2 , . . . , Xn )
≤H(Xn+1 |X2 , . . . , Xn ) (conditioning reduce entropy)
=H(Xn |X1 , . . . , Xn−1 ) (stationary)
Since H(Xn |Xn−1 , . . . , X1 ) is nonnegative and decreasinging, it has
a limit H ′ (X ).
Peng-Hua Wang, April 2, 2012
Information Theory, Chap. 4 - p. 11/13
Entropy Rate
Theorem 4.2.1 For a stationary stochastic process, both H(X ) and
H ′ (X ) exist and are equal.
H(X ) = H ′ (X ).
Proof. By the chain rule,
n
1
1X
H(X1 , X2 , . . . , Xn ) =
H(Xi |, Xi−1 , . . . , X1 ),
n
n i=1
that is, the entropy rate is the time average of the conditional entropies.
Since the conditional entropies has a limit H ′ (X ). We conclude that
the entropy rate has the same limit by Theorem of Cesáro mean.
Peng-Hua Wang, April 2, 2012
Information Theory, Chap. 4 - p. 12/13
Cesáro mean
Theorem (Cesáro mean) If an
bn → a.
→ a and bn =
1
n
Pn
i=1
ai , then
> 0. Since an → a, there exists a number N such that
|an − a| ≤ ǫ for n > N . Hence,
n
n
1 X
1X
(ai − a) ≤
|(ai − a)|
|bn − a| = n
n
i=1
i=1
Proof. Let ǫ
N
N
X
X
n−N
1
1
|(ai − a)| +
ǫ≤
|(ai − a)| + ǫ ≤ ǫ
≤
n i=1
n
n i=1
when n is large enough.
Peng-Hua Wang, April 2, 2012
Information Theory, Chap. 4 - p. 13/13