Download Cryptography

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Mathematics of radio engineering wikipedia , lookup

List of prime numbers wikipedia , lookup

Large numbers wikipedia , lookup

Positional notation wikipedia , lookup

Location arithmetic wikipedia , lookup

Approximations of π wikipedia , lookup

Collatz conjecture wikipedia , lookup

Law of large numbers wikipedia , lookup

Algorithm wikipedia , lookup

Proofs of Fermat's little theorem wikipedia , lookup

Addition wikipedia , lookup

Factorization of polynomials over finite fields wikipedia , lookup

Arithmetic wikipedia , lookup

Elementary mathematics wikipedia , lookup

Transcript
CS1260 Mathematics for Computer Science
Unit 5 More on Modular Arithmetic
The Euclidean Algorithm
In the last unit the greatest common divisor (gcd) was discussed and shown to be relevant for
a number of applications. For small numbers the gcd of two integers a and b can be found by
factorising a and b into prime factors and then multiplying those prime factors that divide
both a and b. However for large values of a and b, prime factorization is time-consuming and
inefficient. There is a much more efficient way of calculating the gcd known as the
Euclidean Algorithm which dates back more than 2000 years to the Greek mathematician
Euclid. The algorithm uses repeated remaindering. Here is a Java method that returns the
gcd of its two parameters:
public static int gcd(int a, int b) {
int r;
// first make sure a and b are positive
// so that the returned gcd is never negative
if (a < 0) a = -a;
if (b < 0) b = -b;
while (b != 0) {
r = a % b;
// find the remainder
a = b;
b = r;
}
// when the loop terminates a holds the required gcd
return a;
}
As an example let's trace the algorithm when a = 3640 and b = 588. At the start of each
iteration of the while loop the values of the variables are
a
3640
588
112
28
b
588
112
28
0
r
???
112
28
0
Thus the required greatest common divisor is 28 (the value of a when b becomes zero).
The Extended Euclidean Algorithm
By Euler's theorem a number b is invertible mod a if and only if gcd(b, a) =1 and
b-1

ba
mod a
Thus the inverse of b mod a can be found by raising b to a suitable power. However this is
not a very efficient way of calculating the inverse of b if a is large; we must first calculate
(a) (which involves factorising a) and then calculating ba. In fact we can use an
extension of the Euclidean algorithm to find b–1 much more efficiently. This extended form
of Euclidean Algorithm which enables us to find integers k and l such that
k a + l b = gcd(a, b)
© A Barnes 2006
1
CS1260/L5
Thus if a and b are coprime (that is if gcd(a, b) = 1), then k a + l b = 1 and thus:
l b–1 mod a
and also incidentally
k a–1 mod b
Here is a Java implementation of the extended algorithm to find the inverse of b mod a:
// returns the inverse of b mod a if it exists
// or zero if the inverse does not exist
public static int inverse(int b, int a){
int r, q, m, n;
int k = 1;
int k1 = 0;
int l = 0;
int l1 = 1
int A = a;
// remember the original value of a
// first make sure a and b are positive
if (a < 0) a = -a;
if (b < 0) b = -b;
while (b != 0) {
// find the quotient & remainder
q = a/b;
r = a - q*b; // more efficient than a%b once q is known
a = b; b = r;
// actually we don’t really need to calculate k and k1
// to find the inverse of b mod a. It is only needed
// if we also want to calculate inverse of a mod b
m = k - q*k1;
k = k1; k1 = m;
n = l - q*l1;
l = l1; l1 = n;
}
if (a == 1) {
l = l % A;
if (l < 0) l = l + A;
return l;
}
else
return 0;
}
If b is invertible mod a then inverse above returns the required inverse, whereas if b is not
invertible mod a, it returns zero (which, of course, can never be the inverse of any value). As
an example let's trace the algorithm when n = 1001 and b = 101. At the start of each iteration
of the while loop the values of the variables are
© A Barnes 2006
2
CS1260/L5
a
1001
101
92
9
2
1
q
??
9
1
10
4
2
r
??
92
9
2
1
0
b
101
92
9
2
1
0
m
??
1
–1
11
–45
101
k
1
0
1
–1
11
–45
k1
0
1
–1
11
–45
101
n
??
–9
10
–109
446
–1001
l
0
1
–9
10
–109
446
l1
1
–9
10
–109
446
–1001
Thus the gcd(10001, 101) =1 and required values of k and l such that k a +l b = 1 are thus
k = –45 and l = 446 (Check: –45*1001+ 446* 101 = –45045 + 45046 =1).
Thus
101–1
446 mod 1001
and
1001–1
92–1–45 56 mod 101
Random Number Generators.
There are many applications where an element of randomness is needed, and modular
arithmetic is a good way of generating random numbers.
For a card game you will certainly need to shuffle the pack (i.e. select all the cards in a
random order). For a board game you will need to roll the dice (i.e. select at random two
integers in the range 1 to 6). For an action game, you may want some randomness to
represent external variations and prevent predictability (as in the Zuul transporter room
example in the JPF). To simulate the randomness in the real world you may need to generate
events with a certain probability. For example, in a traffic simulation vehicles enter the
system at certain random times.
Albert Einstein said "God does not play dice." However, whether God does or does not play
dice, if you really want truly random numbers, then the only way to provide is to build some
form of complex physical system and use its output. Of course, a Lottery machine is not
very portable, it only generates numbers in the range 1–49, and typing in the numbers as they
come up is not a very convenient interface. An alternative is to use radioactive decay as a
source of randomness; unfortunately, carrying around a large lump of uranium or some other
radioactive isotope has some rather unfortunate side effects.
What is needed is an algorithm that generates random numbers. Of course, this is impossible,
since an algorithm is deterministic (i.e. predictable). Instead, we use an algorithm that
generates pseudo-random numbers. Such an algorithm generates a sequence of numbers
that appear to be random under a wide range of statistical tests (even though they aren't truly
random). For example, if we generate numbers which are pseudo-randomly distributed in the
range 0 to 1 then in a sequence of 1000 such numbers, we would expect roughly 100 to be in
the range 0 to 0.1, another 100 or so in the range 0.1 to 0.2 and so on. We can test whether
the difference between the actual number in a given sequence and 100 is statistically
significant. In any sequence of truly random numbers, all patterns of numbers will occur
eventually. Note that it makes no sense to ask if a single number is random or not;
randomness is a property of a group of numbers. Because of general laziness, pseudo-random
number generators are usually known as random number generators.
Most practical random number generators are based on simple multiplicative congruential
algorithms of the form
© A Barnes 2006
3
CS1260/L5
Ij+1 = a Ij
mod m
This algorithm generates a sequence of integers given by I1, I2, I3, ... etc. If a and m are
chosen carefully, you can generate m–1 different integers. These are usually converted to
random numbers in the range 0 to 1 by dividing by m. If required, numbers in any specified
range can then be generated by scaling (and then rounding if only integer values are required).
The first number in the sequence is called the seed.
Let us try out a very simple example of this algorithm just to show that it works. We will
choose m = 5, a = 2 and the seed I0 = 1. Then the sequence of numbers is
I1 = 2,
I2 = 4,
I3 = 3,
I4 = 1,
I5 = 2,
I6 = 4 ....
Note how we get 4 distinct integers and that the sequence repeats after 4 numbers. We say
that the period of the generator is 4. If we want real numbers, we divide by 5 and get
R1 = 0.4,
R2 = 0.8,
R3 = 0.6,
R4 = 0.2, ....
What happens if we choose a seed of 0?
Of course in practice we will require a much larger period for our random number generator
and will need to choose much larger values of a and particularly m. It is important to realise
that the choice of a and m can have a big effect on the quality of your random number
generator (in particular gcd(a, m) should be 1) and is best left to experts in number theory.
That is, don't just make up some values, but look them up in a reliable text, and make sure
that you transcribe them into your program accurately (or use a random number generator
from a suitable computer package).
As a simple example of what can go wrong consider the values a=2 and m = 2,147,483,647
= 231 – 1. If we start with a seed set to 1, then the first ten values will all be less than
210 =1024; as fractions they will all be less than 10–6 = 0.000001. Another way of saying
this is that small numbers are followed by several small numbers; this is a highly undesirable
correlation between numbers in the sequence. One recommended pair of values is
a = 75 = 16807
m = 231 – 1 = 2147483647
This algorithm is easy to implement using the class Zn (with MODULUS = 2147483647) at
the end of the last unit. We must however be careful to use the second version in which the
arithmetic in the class is carried out with long integers so that we don't get problems with
overflow.
public class Random {
private static final double MAX = Zn.MODULUS;
private Zn SEED;
private Zn a
public Random(int seed) {
SEED = new Zn(seed);
a = new Zn(16807);
}
// return a value in the range 1 to 2147483646 inclusive
public int getNextRandom() {
SEED = Zn.times(a, SEED);
© A Barnes 2006
4
CS1260/L5
return SEED.val;
}
// return a value in the range 0.0 to 1.0
public double getNextDouble() {
return getNextRandom()/MAX;
}
// other methods returning integers in a specific range
etc..
}
Suppose that we set the seed to a particular value (42, say) and then generate 10 numbers. If
we reset the seed to the same value again and generate 10 numbers, we will get the same 10
numbers. This might seem to be a nuisance, given that we want unpredictable behaviour, but
is actually very helpful. Suppose that you have a large system using a random number
generator and that midway through a run the program goes wrong (e.g. it crashes because of a
bug). To debug the problem, you will need to run the system again under identical conditions
to find out what has gone wrong and prove that you have removed the error. However, if you
don't know what the seed was when you first ran the system, you cannot run it again in an
identical way. This makes debugging more difficult. The solution is to run the system from
known seeds for debugging purposes.
In normal use different seeds will be used for each run. Since 0 is not a sensible value to use
for the seed, if it is used when a random number generator is initialised (or if no seed is
supplied by the user) some random number generators generate a non-zero seed themselves
using some sort of pseudo-random process. One common method is to read the system clock
and use its 31 least significant binary digits to set the seed.
Cryptography
The transmission of information that you don't want to be intercepted and read by any
unauthorised eavesdropper requires a method of encryption so that only the receiver can
decrypt it. Modular arithmetic is the basis of the most commonly used cryptographic system
on the Internet: the RSA algorithm.
Public Key Cryptography
Suppose A wants to send a message to B and does not want anyone else to intercept the
message and read it. The principle of a public-key cryptosystem is that it should be easy to
transmit information from A in a secure encrypted format by using B's public key, but special
knowledge (known as B's private key) is needed to decrypt (retrieve) the information. The
information is encrypted using B's public key, which is available to everyone, and decrypted
by using B's private key, which is known to B alone. Similarly when B wants to reply .to A
he encodes the message using A's public key which is known to everyone and decrypted by A
using a private key known only to him.
The important notion is of a trapdoor or one-way function which is easy to compute but
whose inverse is hard to compute. Here encryption using the public key is easy, but
decryption is difficult (without knowledge of the private key). In this section we will briefly
describe the RSA system (named after its inventors Ronald Rivest, Adi Shamir, and Leonard
Aldeman).
© A Barnes 2006
5
CS1260/L5
Normally we want to transmit text, made up of characters. However, the RSA system works
with natural numbers, so we must first convert characters into numbers: this is easily done
using (for example) the ASCII format which represents characters by 7-bit numbers. Then,
for example, by concatenating (that is joining together) the 7 bit codes of 8 characters we
would produce a 56-bit (7 byte) number.
The RSA Algorithm
If we wish to use the RSA system we start by choosing p and q, two distinct (i.e. different)
large prime numbers. It is relatively easy to decide if a number is prime and so it is quite easy
to generate two such primes. Then we calculate n = pq and also the Euler function of n
namely (n) = n (1– 1/p)(1– 1/q) = p q (1– 1/p)(1– 1/q = (p-1)(q-1). We then choose as our
private key any number 1 < k < (n) which is coprime with (n) that is gcd(k, (n)) = 1.
Thus k is invertible mod (n) (by Euler’s theorem) with inverse a (say). Thus there exists a
number a with 1 < a < ((n) such that
a k  1 mod (n)
Furthermore, as we saw earlier in this unit, a can be efficiently calculated using the extended
Euclidean algorithm. The pair of numbers a and n forms our public key which is published
in a 'telephone directory' along with the public keys of other people (perhaps on the Internet).
Suppose that that someone wishes to send us a secret message. They first split the message
into sections that can be transformed into an integer M in the range 0 to n–1 (using some
suitable numerical encoding of characters such as ASCII or Unicode). To encrypt M they
calculate
e  Ma mod n
This only requires knowledge of the public key (a, n). When we receive the encrypted
message e we can decrypt it to get the original message M by calculating:
ek mod n )k mod n  Ma k M(1 – b(n))  M
In the last step we have used Euler's theorem to conclude M(n)  1 mod n, and hence
M–b(n)  1 mod n.
The security of the RSA scheme lies in the fact that factorising the number n in the public
key is very time-consuming when n is large. In practical use the two primes p and q are
chosen to each have at least 100 decimal digits so that n is at least 200 digits in length. Even
with the best factorisation algorithms and using the very fastest super-computers the
factorisation would take may thousands of years. Thus here the trap-door function is
essentially just multiplication provided by the difficulty of factorisation. Note however that
once the factorisation of n is known, (n) can easily be calculated and then the private key k
is easily calculated from the public key a using the Euclidean algorithm.
Let us work through an examplei involving moderately sized (two digit) primes. Take p = 53
and q = 61, so that n=3233 and we calculate (n) = (p-1)(q-1) 52 . 60 = 3120. Let us choose
our private key k = 1013 which is actually prime. (Since 3120 = 24 . .3 .5. 13, any number
which does not have 2, 3, 5 or 13 as factors would do for the private key). Using Euclid's
algorithm with 3120 and 1013 we find that
77 1013 - 25 . 3120 = 1
© A Barnes 2006
6
CS1260/L5
Thus 1013–1  77 mod 3120 and so our public key is (77, 3233). A number 0 < M < 3233
is encrypted as
e  M77 mod 3233
For example, if M = 10, e  1077 mod 3233  2560 mod 3233. We decrypt this by
computing e1013 = 25601013 mod 3233  10 mod 3233.
If we wanted to transmit an arbitrary string of characters, we would convert it into to string of
binary digits 0 and 1 by using (say) the ASCII encoding. Then we would have to split it into
blocks of length 11 to ensure that every number was less than 3233 (since 211 = 2048 < 3233.
An Efficient Algorithm for Computing Powers
Although computing the relevant powers in modular arithmetic is simple in principle, it is
very time consuming to compute directly. For example, calculating M77 by working out M2,
M3, M4 and so on requires 76 multiplications modulo 3233. Similarly the decryption
involves calculating e1013 and requires 1012 multiplication modulo 3233. Remember also
that in practical applications the public and private keys involved have typically have between
100 and 200 digits and so the calculation of such high powers using naive repeated
multiplication becomes totally impracticable even with super-computers. Fortunately there is
a much faster algorithm for calculating powers that uses repeated squaring. Using this
algorithm to calculate xN takes at most 2 log2 N multiplications. Thus for 150 digit numbers
(putting N = 10150 in the above expression ) repeated squaring algorithm takes only around
1000 multiplications.
To calculate Mn, we express n as a binary number (for example for n = 77 = 10011012) and
use repeated squaring instead. We accumulate the required result in a variable result which
we initialise to 1 and we store the squares in a variable sq which we initialise to M. Starting
from right hand end of n, for each binary digit,
if the digit is a one we multiply by result by sq
whether the digit is a one or a zero we square sq.
Here is a Java implementation of this algorithm to raise an integer M to the power n:
public int power{int M, int n) {
int result = 1;
int sq = M;
while (true) {
if (n%2 == 1)
result = result * sq;
n = n/2;
if (n == 0) break;
sq = sq * sq;
}
return result;
}
Note that repeatedly remaindering n by 2 and then dividing by 2 produces the binary digits of
n from right to left. The process stops when the repeated divisions by 2 produce a zero result.
Note it is slightly more efficient to test for loop exit before squaring sq as this avoids the
updating of the value of sq in the last iteration of the loop -- a value that will never be used.
© A Barnes 2006
7
CS1260/L5
Here is a trace of the algorithm used to calculate 713 before the loop starts and then at the end
of each loop iteration . Note that 13 = 11012
Before the loop
end of loop 1
end of loop 2
end of loop 3
After loop ends
M
7
7
7
7
7
result
1
7
7
16,807
96,889,010,407
n
13
6
3
1
0
sq
7
49
2,401
5,764,801
5,764,801
It is easy to adapt the algorithm for modular arithmetic. We simply make M, result and sq
of type Zn where Zn is the class defined in the last unit (with MODULUS set to the required
value).
public Zn power{Zn M, int n) {
Zn result = new Zn(1);
Zn sq = M;
while (true) {
if (n%2 == 1)
result = Zn.times(result, sq);
n = n/2;
if (n == 0) break;
sq = Zn.times(sq, sq);
}
return result;
}
Error-Detecting & Error-Correcting Codes.
There are many situations in which a large amount of data has to be transmitted quickly and
reliably. Obvious examples are communication by telephone and the transmission of
television signals to and from satellites or other spacecraft. Often large amounts of data need
to be stored on various storage devices (magnetic disks, CD’s and DVD devices etc. In
practice, noise (random fluctuations or external disturbances) may corrupt signals in
transmission. Similarly stored data may be slowly corrupted due to exposure to heat, stray
magnetic fields, UV radiation or even the cosmic ray background.
Error-detecting codes are used to encode values in digital form incorporating extra
information so that if the data is corrupted, the fact that corruption has occurred may be
detected. Thus the corrupted data can be discarded or (if practicable) the receiver can request
re-transmission of the corrupted portion of data.
Of course it may not always be possible to request re-transmission of corrupted data (for
example in real time transmission of TV pictures). Similarly if data is corrupted on a storage
device, the original data cannot magically be re-stored and no back-up copy may be readily
available. In these circumstances error-correcting codes are often used. Error correcting
codes are used to encode values in a digital form incorporating sufficient extra information so
that even if the data is corrupted, the original data values can be recovered.
Matrices with modular numbers as entries are the basis of most practical error-detecting and
error-correcting codes. We will consider matrices in a later lecture.
Appendix
Outline Proof of the Euclidean Algorithm (not required for exam. purposes)
Let the quotient and remainder at the jth stage of the algorithm be qj and rj respectively. For
convenience we also write a = r–1 and b = r0. Then we have
© A Barnes 2006
8
CS1260/L5
a = r–1 = q1 b + r1 = q2 r0 + r1
b = r 0 = q2 r 1 + r 2
r1 = q3 r2 + r3
r2 = q4 r3 + r4
............
rj = qj+2 rj+1 + rj+2
Eventually for some n, rn+2 = 0. This follows since rj+2 < rj+1 < rj < ..... < r1 < b
(since each ri is a remainder of a number divided by ri–1 and thus is less than ri–1 ). Thus
rn = qn+2 rn+1
and thus rn+1 divides rn. But since
rn–1 = qn+1 rn + rn+1
rn+1 divides rn.–1. Continuing in this way we can show rn+1 divides all the rj 's and also
divides a and b. Thus rn+1 is a common divisor of a and b. We must show rn+1 is the
greatest common divisor.
We claim that for all values of j there are integers kj and lj such that
rj = kj a + lj b
This is clearly true for j= –1 and j= 0, taking
k–1 = 1
k0 = 0
l–1 = 0
l0 = 1.
Now if rj = kj a + lj b and rj +1 = kj+1 a + lj+1 b, then since rj +2 = rj – qj+2 rj+1 we can
show
rj +2 = (kj – qj+2 kj+1 ) a + ( lj – qj+2 lj+1 ) b = kj+2 a + lj+2 b
where
lj+2 = lj – qj+2 lj+1
kj+2 =kj – qj+2 kj+1
Thus in particular
rn+1 = kn+1 a + ln+1 b
Now if d is a common divisor of a and b, then clearly d divides rn+1 and thus d ≤ rn+1. Thus
rn+1 must be the greatest common divisor of a and b.
Note that in the extended Euclidean algorithm the l, l1, k, k1, m and n are used to calculate
successive values of ln and kn.
i
J.Truss Discrete Mathematics for Computer Scientists Addison-Wesley (1999) Example 10.7
© A Barnes 2006
9
CS1260/L5