Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
University of Siena
PhD short course
Information Theory and Statistics
Siena, 15-19 September, 2014
The method of types
Mauro Barni
University of Siena
Short course on Information Theory and Statistics, Siena, September 2014
M. Barni, University of Siena, VIPP group
University of Siena
Outline of the course
• Part 1: Information theory in a nutshell
• Part 2: The method of types and its relationship with
statistics
• Part 3: Information theory and large deviation theory
• Part 4: Information theory and hypothesis testing
• Part 5: Application to adversarial signal processing
Short course on Information Theory and Statistics, Siena, September 2014
M. Barni, University of Siena, VIPP group
University of Siena
Outline of Part 2
• The method of types
– Definitions
– Basic properties with proof of theorems
• Law of large numbers
• Source coding, Universal source coding
Short course on Information Theory and Statistics, Siena, September 2014
M. Barni, University of Siena, VIPP group
University of Siena
Type or empirical probability
Type, or empirical probability, of a sequence
N(a | x n )
Px n (a) =
∀a ∈ X
n
Set with all the types with denominator n
Pn = all types with denominator n
'
*
!1 4 $ ! 2 3$ ! 3 2 $ ! 4 1$
if X = {0,1} P5 = (( 0,1), # , &, # , &, # , &, # , &, (1, 0 )+
" 5 5% " 5 5% " 5 5% " 5 5%
)
,
Short course on Information Theory and Statistics, Siena, September 2014
M. Barni, University of Siena, VIPP group
University of Siena
Type class
Type class: all the sequences having the same type
T (P) = { x n ∈ X n : Px n = P}
Example:
x 5 = 01100
!3 2$
Px 5 = # , &
"5 5%
') 11000,10100,10010,10001, 01100
T ( Px 5 ) = (
)* 01010, 01001, 00110, 00101, 00011
Short course on Information Theory and Statistics, Siena, September 2014
+)
,
)-
M. Barni, University of Siena, VIPP group
University of Siena
Number of types
The number of types grows polynomially with n
Theorem
The number of types with denominator n is upper
bounded by:
Pn ≤ (n +1)
|X |
Proof.
Obvious.
Short course on Information Theory and Statistics, Siena, September 2014
M. Barni, University of Siena, VIPP group
University of Siena
Probability of a sequence
Theorem
The probability that a sequence x = xn is emitted by
a DMS source with pmf Q is
Q(x) = 2
−n( H (Px ) +D(Px ||Q))
if Px = Q
Q(x) = 2 −nH (Px ) = 2 −nH (Q)
Remember
The larger the KL distance from the type of x and Q
the lower the probability.
Short course on Information Theory and Statistics, Siena, September 2014
M. Barni, University of Siena, VIPP group
University of Siena
Probability of a sequence
Proof.
Q(x) = ∏ Q(xi ) = ∏ Q(a) N (a|x)
a∈ X
i
= ∏ Q(a)nPx (a) = ∏ 2 nPx (a)logQ(a)
a∈ X
a∈ X
= ∏ 2 n[ Px (a)logQ(a)−Px (a)log Px (a)+Px (a)log Px (a)]
a∈ X
n
=2
"
%
x (a) +P (a)log P (a)
'
∑$#−Px (a)log PQ(a)
x
x
&
a
=2
Short course on Information Theory and Statistics, Siena, September 2014
−n[ H (Px )+D(Px ||Q)]
M. Barni, University of Siena, VIPP group
University of Siena
Examples
• Probability of a specific sequence with n/2 tails and
heads
– Fair coin
– Biased coin with P(H) = 1/3, P(T) = 2/3
• Same as above with n/3 heads
– Fair coin
– Biased coin with P(H) = 1/3, P(T) = 2/3
Short course on Information Theory and Statistics, Siena, September 2014
M. Barni, University of Siena, VIPP group
University of Siena
Size of a type class
Theorem
The size of a type class T(P) can be bounded as
follows:
1
nH (P )
nH (P )
2
≤ T (P) ≤ 2
|X |
(n +1)
Remember
The size of a type class grows exponentially with
growing rate equal to the entropy of the type.
Short course on Information Theory and Statistics, Siena, September 2014
M. Barni, University of Siena, VIPP group
University of Siena
Size of a type class
Proof. (upper bound)
Given P ∈ Pn consider the probability that a source with pmf P
emits a sequence in T (P). We have
1≥
∑
x∈T (P )
P(x) =
∑
2 −nH (P ) = T (P) 2 −nH (P )
x∈T (P )
T (P) ≤ 2 nH (P )
Short course on Information Theory and Statistics, Siena, September 2014
M. Barni, University of Siena, VIPP group
University of Siena
Size of a type class
Proof. (lower bound)
!
n
T (P) = ##
" nP(a1 ) ... nP(a|X | )
n
!n$
!n$
# & ≤ n! ≤ n # &
"e%
"e%
T (P) ≥
n
Stirling approximation
"n%
$ '
#e&
n
$
n!
&=
& n !n ! n !
|X |
% 1 2
n
after some algebra
n
" n1 % 1
" n|X | % |X |
n1 $ ' n|X | $
'
#e&
# e &
Short course on Information Theory and Statistics, Siena, September 2014
T (P) ≥
1
nH (P )
2
(n +1)|X |
M. Barni, University of Siena, VIPP group
University of Siena
Probability of a type class
Theorem
The probability that a DMS with pmf Q emits a sequence
belonging to T(P) can be bounded as follows:
1
−nD(P||Q)
−nD(P||Q)
2
≤
Q(T
(P))
≤
2
(n +1)|X |
Remember
The larger the KL distance between P and Q the smaller
the probability. If P=Q, the probability tends to 1
exponentially fast
Short course on Information Theory and Statistics, Siena, September 2014
M. Barni, University of Siena, VIPP group
University of Siena
Probability of a type class
Proof.
Q(T (P)) =
∑
Q(x ) =
x∈T (P )
= T (P) 2
∑
2 −n(H (P )+D(P||Q))
x∈T (P )
−n(H (P )+D(P||Q))
By remembering the bounds on the size of T(P):
1
−nD(P||Q)
−nD(P||Q)
2
≤
Q(T
(P))
≤
2
|X |
(n +1)
Short course on Information Theory and Statistics, Siena, September 2014
M. Barni, University of Siena, VIPP group
University of Siena
In summary
| Pn | ≤ (n +1)
|X |
Q(x) = 2
−n[ D(Px ||Q)+H (Px )]
T (P) ≈ 2
nH (P )
Q(T (P)) ≈ 2 −nD(P||Q)
Short course on Information Theory and Statistics, Siena, September 2014
M. Barni, University of Siena, VIPP group
University of Siena
Information Theory
and Statistics
Short course on Information Theory and Statistics, Siena, September 2014
M. Barni, University of Siena, VIPP group
University of Siena
Law of large numbers
The law of large numbers provides the link between Information
Theory and Statistics.
The weak form of the LLN states that
Given a sequence of n iid random variables Xi
1 n
X = ∑ Xi
n i=1
∀ε > 0 lim Pr{| X − µ X | > ε } = 0
n→∞
Standard proof is based on Chebyshev inequality.
LLN can be easily extended to relative frequencies and
probabilities (for discrete random variables).
Short course on Information Theory and Statistics, Siena, September 2014
M. Barni, University of Siena, VIPP group
University of Siena
Law of large numbers (IT perspective)
Q(T (P)) ≈ 2
When n grows the only type
class with a non-negligible
probability is Q
−nD(P||Q)
Theorem (law of large numbers)
TQε = { x n : D(Px n || Q) ≤ ε }
P(x n ∉ TQε ) =
∑
Q(T (P)) ≤
P:D(P||Q)>ε
≤ (n +1)|X | 2 −nε = 2
∑
2 −nD(P||Q) ≤
P:D(P||Q)>ε
#
log(n+1) &
−n%ε −|X |
('
$
n
Short course on Information Theory and Statistics, Siena, September 2014
∑
2 −nε
P:D(P||Q)>ε
That tends to 0
when n tends to
infinity
M. Barni, University of Siena, VIPP group
University of Siena
Source coding (achievability)
Source coding theorem (Shannon ’48)
Given a DMS source Q, any rate R such that
R = H (Q) + ε
is achievable (for any ε > 0)
Code sequences of increasing lenght n. Code efficiently only
the sequences in T(Q), since the others will (almost) never
occur. To do that we need only nH(Q) bits.
Short course on Information Theory and Statistics, Siena, September 2014
M. Barni, University of Siena, VIPP group
University of Siena
Source coding: rigorous proof
Choose a small ε and define
TQε = {x n : D(Px n || Q) ≤ ε }
By the continuity of D
d(Px n ,Q) ≤ ε ' which → 0 if ε → 0
By the continuity of H
H (Px n ) ≤ H (Q) + ε '' which → 0 if ε ' → 0
1. Code sequences in TQε by counting them in TQε
2. Code sequences not in TQε by counting them in X
Short course on Information Theory and Statistics, Siena, September 2014
n
M. Barni, University of Siena, VIPP group
University of Siena
Source coding: rigorous proof
The average number of bits is
L ≤ Pr{TQε }[nH (Q) + nε ''+ | X | log(n +1)]+ (1− Pr{TQε })n log(| X |)
L
log(n +1)
≤ H (Q)ε ''+ | X |
+ δ log(| X |)
n
n
That can be made arbitrarily small by increasing n and by
properly choosing ε and δ
Short course on Information Theory and Statistics, Siena, September 2014
M. Barni, University of Siena, VIPP group
University of Siena
Universal source coding
What if Q is not known ?
The suprising result is that we can still code at anyrate larger than
the Entropy.
Observe a sequence of emitted symbols and estimate Q, then
transmit information about the type and the index of the
sequence within the type
Short course on Information Theory and Statistics, Siena, September 2014
M. Barni, University of Siena, VIPP group
University of Siena
Universal source coding (rigorous proof)
Choose an arbitrarily small ε and let TQε = { x n : D(Px n || Q) ≤ ε }.
Given a sequence x n use | X | log(n +1) bits to indicate its type
and nH (Px n ) to index x n within the type.
The average number of bits used by the code is:
| X | log(n +1)
+ ∑ Q(x n )H (Px n ) + ∑ Q(x n )H (Px n )
n
x n ∉TQε
x ∈TQε
≤
| X | log(n +1)
+ Q(x n ∉ TQε )log X + Q(x n ∈ TQε )[H (Q) + δ ] ≤ H (Q) + δ '
n
Being ε and δ (and hence δ’) arbitrarily small, any rate larger than
H(Q) can be obtained.
Short course on Information Theory and Statistics, Siena, September 2014
M. Barni, University of Siena, VIPP group
University of Siena
Channel coding
The method of types can be used to prove many other
results in IT including the channel coding theorem
Outside the scope of this course
Short course on Information Theory and Statistics, Siena, September 2014
M. Barni, University of Siena, VIPP group
University of Siena
References
1. T. M. Cover and J. A. Thomas, “Elements of Information Theory”,
Wiley
2. I. Csiszar, ”The method of types”, IEEE Trans. Inf. Theory, vol.44,
no.6, pp. 2505–2523, Oct. 1998.
3. I. Csiszar and P. C Shields, “Information Theory and Statistics; a
Tutorial”, Foundations and Trends in Commun. and Inf. Theory,
2004, NOW Pubisher Inc.
Short course on Information Theory and Statistics, Siena, September 2014
M. Barni, University of Siena, VIPP group