Download Information theory

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability interpretations wikipedia , lookup

Randomness wikipedia , lookup

History of randomness wikipedia , lookup

Probability box wikipedia , lookup

Inductive probability wikipedia , lookup

Entropy (information theory) wikipedia , lookup

Transcript
Information theory
"
"
Information content of a message
–
a boolean value "true"/"false" can be encoded as one
bit without losing information: 1/0
–
a direction up/down/right/left: 2 bits
–
the strings "AAAAAAAAAAAAAAAAAAAAAA"
and "22*A" can be considered to have the same
meaning, but different length
The information content of a string/message is
measured by its entropy
Entropy
For X = { x1, ... , xn } with associated probabilities
p(x1), ..., p(xn) such that their sum is 1 and all are
positive, the entropy H(X) = !!i=1np(xi)⋅log2(p(xi))
The entropy of a message is higher if the
probablilities are evenly distributed
"
–
booleans B with p(true)=p(false)=½
H(B)=!(½log2(½)+½log2(½))=1
–
X s.t. p(xi)=0 except p(xk)=1
H(X) = 0 (only one possible value: no info)
–
p(xi)=1/n for all i: H(X) = log2 n
Fact: 0 " H(X) " log2 n, where #X# = n
Entropy (cont)
"
The entropy is a measure of the uncertainty of the
contents of, e.g., a message.
"
Higher entropy $ more difficult to use e.g.
frequency analysis
"
Compression raises the entropy of a message
$ good to compress m before encryption
"
First lab tomorrrow: Huffman encoding, a kind of
compression
Redundancy
"
How much of a message can be discarded without
losing information?
The redundancy D = R ! r, where
r = H(X)/N is the rate of the language for msgs of length N
(the entropy per character; average info per character)
R = log2#X# is the absolute rate (the maximum info per
character; maximum entropy)
The redundancy ratio is D/R (how much can be discarded).
"
English:
26 chars $ R % 4.7; 1.0 " r " 1.5 (for large N)
$ 3.2 " D " 3.7 $ between 68%!79% redundant
Equivocation
"
With additional information, the uncertainty may
be reduced
–
"
a random 32!bit integer has H(X)=32, but if we learn
that it is even, the uncertainty is reduced by 1 bit.
The equivocation HY(X) is the conditional entropy
of X given Y
Conditional probabilities
"
For Y&{Y1, ..., Ym} a probability distribution,
let pY(X) be the conditional probability for X
given Y
–
"
"
sometimes written p(X|Y)
and p(X,Y) = pY(X)'p(Y) the joint probability of X
and Y
Perfect secrecy: iff pM(C) = p(C) for all M
– prob. of C received given that M was encrypted is the same as
that of receiving C if some other M’ was encrypted
– requires that |K|(|M|
Equivocation (cont)
"
HY(X) = !!X,Yp(X,Y)'log2 pY(X)
"
HY(X) = !X,Yp(X,Y)'log2 (1/pY(X))
"
HY(X) = !Yp(Y)'!XpY(X)'log2 (1/pY(X))
"
Note: HY(X) " H(X)
–
extra knowledge of Y can not increase the uncertainty
of X
Key equivocation
"
How uncertain is the key, given a cryptogram?
HC(K) ! the key equivocation.
"
If HC(K)=0: no uncertainty, can be broken
"
Usually: limn)*HC(K) = 0
–
"
i.e., the longer the message, the easier to break
HC(K) difficult to compute exactly, but can be
approximated
Unicity distance
"
The unicity distance Nu is the smallest N such that
HC(K) is close to 0
–
the amount of ciphertext needed to uniquely
determine the key
–
but it may still be computationally infeasible
"
Can be approximated to H(K)/D for random
ciphers (where given c and k, Dk(c) is as likely to
produce one cleartext as another)
"
Unconditional security:
–
if HC(K) never approaches 0 even for large N
Unicity distance (cont)
"
"
The DES algorithm encrypts 64 bits at a time with
a 56!bit key
–
H(K)=56, and D=3.2 for English
$ Nu=56/3.2=17.2 characters (137 bit > 2'64)
–
but it takes a lot of effort...
Shift cipher, K=Z26. Then H(K)=4.7, D=3.2, and
Nu=1.5 characters!
–
but D=3.2 only for long messages
–
and poor approximation of random cipher