Download Information Theory and Security

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Arcadia (play) wikipedia , lookup

Inductive probability wikipedia , lookup

Law of large numbers wikipedia , lookup

Transcript
Information Theory and Security
Lecture Motivation

Up to this point we have seen:
– Classical Crypto
– Symmetric Crypto
– Asymmetric Crypto

These systems have focused on issues of confidentiality:
Ensuring that an adversary cannot infer the original plaintext
message, or cannot learn any information about the original
plaintext from the ciphertext.

But what does “information” mean?

In this lecture and the next we will put a more formal framework
around the notion of what information is, and use this to provide
a definition of security from an information-theoretic point of
view.
Lecture Outline

Probability Review: Conditional Probability and Bayes

Entropy:
– Desired properties and definition
– Chain Rule and conditioning

Coding and Information Theory
– Huffman codes
– General source coding results

Secrecy and Information Theory
– Probabilistic definitions of a cryptosystem
– Perfect Secrecy
The Basic Idea

Suppose we roll a 6-sided dice.
– Let A be the event that the number of dots is odd.
– Let B be the event that the number of dots is at least 3.

A = {1, 3, 5}

B = {3, 4, 5, 6}

I tell you: the roll belongs to both A and B then you know there
are only two possibilities: {3, 5}

In this sense A  B tells you more than just A or just B.
That is, there is less uncertainty in A  B than in A or B.


Information is closely linked with this idea of uncertainty:
Information increases when uncertainty decreases.
Probability Review, pg. 1

A random variable (event) is an experiment whose outcomes
are mapped to real numbers.

For our discussion we will deal with discrete-valued random
variables.

Probability: We denote pX(x) = Pr(X = x).
For a subset A,

p(A)   p X x 
xA
Joint Probability: Sometimes we want to consider more than
two events at the same time, in which we case we lump them
together into a joint random variable, e.g. Z = (X,Y).
p X ,Y X, Y   x, y   Pr X  x , Y  y 

Independence: We say that two events are independent if
p X ,Y X, Y   x, y   p X x p Y y 
Probability Review, pg. 2

Conditional Probability: We will often ask questions about
the probability of events Y given that we have observed X=x.
In particular, we define the conditional probability of Y=y
given X=x by
p XY ( x, y)
pY ( y | x) 
pX (x)

Independence: We immediately get pY ( y | x)  pY ( y)

Bayes’s Theorem: If pX(x)>0 and pY(y)>0 then
p X ( x )p Y ( y | x )
p X ( x | y) 
p Y ( y)
Example

Example: Suppose we draw a card from a standard deck. Let
X be the random variable describing the suit (e.g. clubs,
diamonds, hearts, spades). Let Y be the value of the card (e.g.
two, three, …, ace). Then Z=(X,Y) gives the 52 possibilities
for the card.
P( (X,Y) = (x,y) ) = P(X=x, Y=y) = 1/52
P(X=“clubs”) = 13/52 = ¼
P(Y=“3”) = 4/52 = 1/13
Entropy and Uncertainty

We are concerned with how much uncertainty a random event
has, but how do we define or measure uncertainty?

We want our measure to have the following properties:
1.
2.
To each set of nonnegative numbers p  p1 , p2 ,, pn with
p1  p2   pn  1 , we define the uncertainty by H ( p) .
H ( p) should be a continuous function: A slight change in p
should not drastically change H ( p)
3.
for all n>0. Uncertainty increases
H n1 ,, n1   H n11 ,, n11 
when there are more outcomes.
4.
If 0<q<1, then
Hp1 ,, qp j , (1  q)p j ,, p n   Hp1 ,, p n   p jHq,1  q 
Entropy, pg. 2

We define the entropy of a random variable by
HX     px  log 2 p( x )
x


Example: Consider a fair coin toss. There are two outcomes,
with probability ½ each. The entropy is
1 1
1
1
  log 2  log 2  1 bit
2 2
2
2
Example: Consider a non-fair coin toss X with probability p of
getting heads and 1-p of getting tails. The entropy is
HX  p log 2 p  1  plog 2 1  p
The entropy is maximum when p= ½.
Entropy, pg. 3

Entropy may be thought of as the number of yes-no questions
needed to accurately determine the outcome of a random
event.

Example: Flip two coins, and let X be the number of heads.
The possibilities are {0,1,2} and the probabilities are {1/4,
1/2, 1/4}. The Entropy is

1 1
1 1
1
  log 2  log 2  log 2
4 2
2 4
4
So how can we relate this to questions?
1 3
  bits
4 2

First, ask “Is there exactly one head?” You will half the time
get the right answer…

Next, ask “Are there two heads?”
Half the time you needed one question, half you needed two

Entropy, pg. 4

Suppose we have two random variables X and Y, the joint
entropy H(X,Y) is given by
HX, Y     p XY x, y  log 2 p XY ( x, y)
x y

Conditional Entropy: In security, we ask questions of whether
an observation reduces the uncertainty in something else. In
particular, we want a notion of conditional entropy. Given that
we observe event X, how much uncertainty is left in Y?
HY | X    p X ( x )H(Y | X  x )
x


  p X ( x )  p Y ( y | x ) log 2 p Y ( y | x ) 
x
 y

  p XY ( x , y) log 2 p Y ( y | x )
x
y
Entropy, pg. 5

Chain Rule: The Chain Rule allows us to relate joint entropy to
conditional entropy via H(X,Y) = H(Y|X)+H(X).
HX, Y    p XY ( x, y) log 2 p XY ( x, y)
x
y
  p XY ( x, y) log 2 p X ( x )p Y ( y | x )
x
y
  p X ( x ) log 2 p X ( x )  H(Y | X)
x
 H(X)  H(Y | X)
(Remaining details will be provided on the white board)

Meaning: Uncertainty in (X,Y) is the uncertainty of X plus
whatever uncertainty remains in Y given we observe X.
Entropy, pg. 6

Main Theorem:
1.
Entropy is non-negative. H (X )  0
2.
H(X)  log 2 
where  denotes the number of elements
in the sample space of X.
3.
HX, Y  H(X)  H(Y)
4.
(Conditioning reduces entropy)
H(Y | X)  H(Y)
with equality if and only if X and Y are independent.
Entropy and Source Coding Theory

There is a close relationship between entropy and representing
information.

Entropy captures the notion of how many “Yes-No” questions
are needed to accurately identify a piece of information… that
is, how many bits are needed!

One of the main focus areas in the field of information theory is
on the issue of source-coding:
– How to efficiently (“Compress”) information into as few bits as
possible.

We will talk about one such technique, Huffman Coding.

Huffman coding is for a simple scenario, where the source is a
stationary stochastic process with independence between
successive source symbols
Huffman Coding, pg. 1

Suppose we have an alphabet with four letters A, B, C, D with
frequencies:
A
0.5
B
0.3
C
0.1
D
0.1
We could represent this with A=00, B=01, C=10, D=11. This
would mean we use an average of 2 bits per letter.

On the other hand, we could use the following representation:
A=1, B=01, C=001, D=000. Then the average number of bits
per letter becomes
(0.5)*1+(0.3)*2+(0.1)*3+(0.1)*3 = 1.7
Hence, this representation, on average, is more efficient.
Huffman Coding, pg. 2


Huffman Coding is an algorithm
that produces a representation for
a source.
A 0.5
The Algorithm:
B 0.3
– List all outputs and their
probabilities
– Assign a 1 and 0 to smallest two,
and combine to form an output
with probability equal to the sum
– Sort List according to
probabilities and repeat the
process
– The binary strings are then
obtained by reading backwards
through the procedure
1
1.0
1
0.5
C 0.1
1
0
0.2
0
D 0.1
0
Symbol Representations
A: 1
B: 01
C: 001
D: 000
Huffman Coding, pg. 3

In the previous example, we used probabilities. We may directly
use event counts.

Example: Consider 8 symbols, and suppose we have counted
how many times they have occurred in an output sample.
S1
28
S2
25
S3
20
S4
16
S5
15
S6
8
S7
7
S8
5

We may derive the Huffman Tree

The corresponding length vector is (2,2,3,3,3,4,5,5)

The average codelength is 2.83. If we had used a full-balanced
tree representation (i.e. the straight-forward representation) we
would have had an average codelength of 3.
Huffman Coding, pg. 4

We would like to quantify the average amount of bits needed in
terms of entropy.

Theorem: Let L be the average number of bits per output for
Huffman encoding of a random variable X, then
HX   L  HX   1,
L   px lx
x
Here, lx =length of codeword assigned to symbol x.

Example: Let’s look back at the 4 symbol example
HX  .5 log 2 (0.5)  .3 log 2 (0.3)  .1log 2 (0.1)  .1log 2 (0.1)  1.685
Our average codelength was 1.7 bits.
Next Time

We will look at how entropy is related to security
– Generalized definition of encryption
– Perfect Secrecy
– Manipulating entropy relationships

The next computer project will also be handed out