Download Math 5330 Spring 2013 Elementary factoring algorithms The RSA

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

List of prime numbers wikipedia , lookup

Wiles's proof of Fermat's Last Theorem wikipedia , lookup

Location arithmetic wikipedia , lookup

Infinitesimal wikipedia , lookup

Vincent's theorem wikipedia , lookup

Factorization of polynomials over finite fields wikipedia , lookup

Elementary mathematics wikipedia , lookup

Fermat's Last Theorem wikipedia , lookup

Horner's method wikipedia , lookup

Factorization wikipedia , lookup

Arithmetic wikipedia , lookup

Proofs of Fermat's little theorem wikipedia , lookup

Transcript
Math 5330
Spring 2013
Elementary factoring
algorithms
The RSA cryptosystem is founded on the idea that, in general, factoring is hard. Where
as with Fermat’s Little Theorem and some related ideas, one can usually tell very quickly if
a composite number is, in fact, composite, actually producing a factorization of a composite
number is a very different thing. Currently, the only method at our disposal is trial division.
For small numbers, trial division is the method of choice. If you wish to factor a number
n ă 1010 , you should probably use trial division.
But what if you want to factor a large number? Trial division still has a part to play.
If you have a number of size roughly 1030 , then you would need to be very lucky to factor
it with trial division. If the number were to be the product of two nearly equal primes
(or if the number itself were prime) then you would have to perform trial division up to
about 1015 to see this. To put this in perspective, there are roughly 29,000,000,000,000
primes up to 1015 , and even if we could perform 106 multi precision divisions a second, it
would take 29,000,000 seconds to try them all. That is, trial division could take about a year.
So what to do with a 30-digit or larger number? First, one usually uses trial division for
a while. After all, we know how to factor any even number. At some point, it is useful to
now that the number actually is composite, so after some trial division, if m is the current
unfactored part, calculate 2m´1 pmod mq. If it is not 1, then m is composite. Usually one
does some more trial division (try, say, all primes p ă 106 .) But after that, switch to some
other factoring method.
What other factoring methods are there? Here I will present several other fairly simple
factoring methods. The first dates back to Fermat, the rest are less than 50 years old.
Fermat’s Factoring Method
Our first method is based on the idea that if n “ x2 ´ y 2 , then n “ px ´ yqpx ` yq. That
is, we will try to represent n as the difference of two?squares, and use that representation to
factor n. To do this, we start with a number x0 “ r ns, and calculate
px0 ` kq2 ´ n,
for k “ 0, 1, 2, . . . , stopping when a square is returned. There is a trick to speed up the
calculations for px0 ` kq2 ´ n, and that is that two successive values are related. That is,
px0 ` k ` 1q2 ´ n “ rpx0 ` kq2 ´ ns ` 2px0 ` kq ` 1,
so we only have to calculate one square. For example, if n “ 3977, then x0 “ 64, and we
need to calculate x20 ´ n “ 642 ´ 3977 “ 119. To calculate 652 ´ 3977 we don’t even have to
square 65, we just add 2 ˆ 64 ` 1 to 119 to get 248. Moreover, these numbers, 2px0 ` kq ` 1
grow by 2 each time, so we don’t even need to recalculate them, we just add 2 to the previous
value. Here is a table for these calculations.
k
0
1
2
3
4
5
x0 ` k
64
65
66
67
68
69
2px0 ` kq ` 1 px0 ` kq2 ´ n
129
119
131
248
133
379
135
512
137
647
139
784 = 282
What this tells us is that 692 ´ 3977 “ 282 which we rearrange as 3977 “ 692 ´ 282 “
p69 ´ 28qp69 ` 28q “ 41 ˆ 97. Each iteration in the table goes very fast on a computer, the
most difficult step of which is to determine if px0 ` kq2 ´ n is a square.
Fermat’s factoring method works reasonably well for small numbers n and for numbers
n “ pq where p and q are nearly equal. An example I’ve come across is in trying to factor
n “ 1022 ` 1. If you use trial division for a while, you find factors 89 and 101, leaving a
19-digit number, 1,112,470,797,641, 561, 909. If you try Fermat’s method on this number,
you fairly quickly find
1, 112, 470, 797, 641, 561, 909 “ 1056689261 ˆ 1052788969.
How good is Fermat’s method? For small numbers, it is a reasonable thing to try. But in
fact, it is worse than trial division in general! The worst case of Fermat’s method is where
n is prime. In this case, n factors as n ¨ 1, so we need x ` y “ n, x “ y “ 1. This means
?
n´1
n`1
and y “
. Now the x here is x0 `k, where x0 is roughly n. That is, we need
x“
2
2
?
n`1
n`1 ?
n`k “
, so k «
´ n steps before concluding that n is prime. To see what
2
2
10
this means, suppose we have an n around
? 10 . This 5is a very small number, as factoring
goes. If n is prime, it will take about n steps or 10 steps to show this by trial division.
With Fermat’s method, it will take 21 1010 ´ 105 steps. Thus, trial divisor takes about 100,000
steps, Fermat’s method takes 4,999,900,000 steps.
On average, one expects to find a composite number n to have a prime divisor of size n.63 ,
and coprime part of size about n37 . If the coprime part is actually prime, then trial divisor
will find the factorization of n in about n.37 steps. Fermat’s method will take something like
1 .63
n steps, so again trial division wins. Thus, in general, one should never use Fermat’s
2
method to completion. You can try several million steps, maybe, hoping to get lucky, but
then switch to something else.
Before moving on to the next method, I should mention that many approaches can be
improved, or are more advantageous in some situations than in others. We already know, for
example, that if n “ 2p , then the only possible divisors of n are primes q ” 1 pmod pq, so we
can skip most numbers when using trial division on such numbers. With Fermat’s method,
there is another way to speed things up. Paradoxically, it is to try to factor a number larger
than n rather than factoring n. Pick some appropriate number, m, and try to factor mn
Page 2
rather than n. The idea is that mn might factor into two nearly equal parts. Here is a simple
example. If we wish to factor 1207 with Fermat’s method, then x0 “ 35 and after 10 steps,
we get x0 ` 9 “ 44, with 1207 “ 442 ´ 272 “ p44 ` 27qp44 ´ 27q “ 71 ˆ 17. If, on the other
hand, we first multiply n by 3, and use Fermat’s method on 3621, then x0 “ 61 and already
we have 612 ´ 3621 “ 100 “ 102 . Here, we have 3621 “ 612 ´ 102 “ 71 ˆ 51, and looking
for the factor divisible by 3, we recover 1207 “ 71 ˆ 17. In general, one multiplies n by some
number with lots of factors, like 315 “ 32 ˆ 5 ˆ 7 on the hopes that some factors multiplying p with others multiplying q producing nearly equal numbers. For example, suppose
we wish to use Fermat’s method to factor 7421. This would require 35 steps with Fermat’s
method: x0 “ 87, 872 ´ 7421 “ 148, 882 ´ 7421 “ 323, . . . , p87 ` 24q2 ´ 7421 “ 4900 “ 702 .
If, instead, we multiply n by 315 and try to factor 2337615, then four steps are required:
x0 “ 1529, 15292 ´ 2337615 “ 226, 1530 Ñ 3285, 1531 Ñ 6346, 1532 Ñ 9409 “ 972 . The
reason: 7421 “ 41 ˆ 181, and these primes are far apart. However, multiplying by 315 gave
the factorization 315ˆ7421 “ 15322 ´972 “ 1629ˆ1435 “ p9ˆ181qp35ˆ41q. Multiplying by
a number m CAN make Fermat’s method worse. I believe there is an algorithm for picking
a sequence of numbers m to multiply by n. One tries Fermat’s method on each mn?for some
prescribed period
of time, and in the end, you can factor n in something under 3 n steps
?
rather than n steps as required by trial division. I do not know the details.
The next two methods were both devised by a mathematician by the name of John Pollard. They are both considerably better than trial division. However, before using them,
one should check that 2n ı 2 pmod nq, so one knows n is composite.
Pollard’s rho method (1975)
This method uses an “iterated functions approach.” Let f pxq “ x2 ` 1 (lots of other functions could be used instead of this one), and consider the sequence f p1q, f pf p1qq, f pf pf p1qqq, . . . .
pmod pq. This sequence will be eventually periodic. This means that after a while, a periodic pattern will present itself. For example, if p “ 23, the sequence is 1, 2, 5, 3, 10, 9, 13,
9, 13, 9, . . . . We call 1, 2, 3, 4, 10 the tail of this eventually periodic pattern. If we let
f m pxq represent the m-fold composition f pf p¨ ¨ ¨ f pxq ¨ ¨ ¨ qq, then for any prime p there are
integers k ‰ m for which f k paq ” f m p1q pmod pq. This is because there are only p possible
remainders when a number is divided by p, but there are infinitely many m. Once we have
an m and a k, then f k`1 p1q ” f m`1 p1q, f k`2 p1q ” f m`2 p1q, and so on. This means that if
p is some unknown divisor of n, and if we could find the right m and k, then we might be
able to find p because p would be a divisor of
gcdpf m p1q ´ f k p1q, nq.
How do we find m and k when we don’t even know p? We use a method called Floyd’s Cylce
Finding Algorithm. The algorithm works like this: Suppose we have a sequence a0 , a1 , a2 , . . .
which is eventually periodic. Then am “ a2m for some integer m. We can use this to form
a factoring algorithm: To factor n, for k “ 1, 2, 3, . . . , calculate gcdpf 2k p1q ´ f k p1q, nq. In
fact, what we do is calculate a sequence f k p1q pmod nq, to keep the numbers from getting
Page 3
too large, and for even values of
n “ 1357. We have
k
1
2
3
4
5
6
7
8
9
10
11
12
k, we calculate gcdpf k p1q ´ f k{2 p1q, nq. As an example, let
fk
f k{2 difference gcd
2
5
2
3
1
26
677
5
672
1
1021
266
26
240
1
193
611 677
-66
1
147
1255 1021
234
1
906
1209 266
943
23
and so, 23 is a divisor of 1357. The reason this works should be made clear if we just do
things modulo 23:
k fk
1
2
2
5
3
3
4 10
5
9
6 13
7
9
8 13
9
9
10 13
11 9
12 13
f k{2
2
5
3
10
9
13
difference
3
5
10
3
4
0
That is, f 12 p1q ´ f 6 is divisible by 23, so it is at the stage k “ 12 that the prime 23 is
discovered by Pollard’s rho algorithm.
How fast is the rho method? Certainly it has to find a prime p in at most p steps. This
does not sound very good: trial division will find p in exactly p steps. However, there is
reason to believe the rho method finds p much faster than p steps. Suppose, instead of
numbers f m p1q, we just produced random numbers. How long would it take before two
of our random numbers agreed modulo p? The is a variation of the birthday problem in
probability: If you pick k things (with replacement) from n types of things, what is the
probability of getting two of the same thing? The probability that the are all different is
ˆ
˙ˆ
˙ ˆ
˙
npn ´ 1qpn ´ 2q ¨ ¨ ¨ pn ´ k ` 1q
1
2
k´1
“1 1´
1´
¨¨¨ 1 ´
.
nk
n
n
n
Page 4
Let’s ask a different question: When is the probability of finding a match 12 ? To approximate
the probability, take the logarithm. We want
˙
k´1
ÿ ˆ
j
.
´ ln 2 “
ln 1 ´
n
j“1
Using the approximation lnp1 ´ xq « ´x, we want
1
2
k´1
kpk ´ 1q
k2
` ` ¨¨¨ `
“
«
.
n n
n
2n
2n
a
?
This means we want k « 2n lnp2q « 1.177 n. For example, with the birthday problem
(how many people do you need in a room to have a?50-50 chance that two have a birthday
in common?), this says you would need about 1.177 365 « 22.5 people.
ln 2 «
What this means for the rho method: If the numbers f m p1q ”act” random enough, then
?
we expect to find a prime p not in p steps, but more like 1.177 p steps. Numerical evidence
?
supports this, so for simplicity, we say the rho method probably finds a factor p in p ă n1{4
steps. More is known. If we used a simpler function for f pxq, say f pxq “ ax ` b, a linear
function rather than a quadratic, then the iterates do not seem random enough, and we get
something more like p steps again. But using most quadratic or higher degree polynomials,
the iterates do appear to act like random numbers.
Pollard’s p ´ 1 method (1974)
Recall Fermat’s Little Theorem yet again: For any prime p, and any number a with p ffl a,
then ap´1 ” 1 pmod pq. In particular, if p ą 2, then 2p´1 ” 1 pmod pq. If m is a multiple of
p ´ 1, say m “ kpp ´ 1q, then 2m “ p2p´1 qk ” 1k ” 1 pmod pq. This means that p 2m ´ 1
for any m where pp ´ 1q m. For example, if p “ 7, then p ´ 1 “ 6 so 7 2m ´ 1 for any m
divisible by 6. For example, 212 ´ 1 “ 4095 “ 7 ˆ 585.
We can turn this into a factoring algorithm as follows: take a sequence of m’s with lots
of small factors (we will use the sequence mk “ k!, but other sequences would work as well.)
For each term in the sequence, we calculate gcdpn, 2mk ´ 1q, and stop when the gcd returns
a number larger than 1. This method will find a prime divisor p of n if p ´ 1 mk . This
method works very well if p ´ 1 has all small prime divisors.
The Maple command “ifactor(n, easy)” does the following: It uses trial division up to
some limit, and then uses some fixed number of iterations of the p ´ 1 method. For example,
ifactor(1037 ´ 1, easy) returns
p3q2 c28 p247629013q.
What this means is that it found 9 and 247,629,103 as factors of 1037 ´ 1, leaving a 28-digit
number that it knew to be composite (the meaning of the c). The factor 247629013 was
found by the p ´ 1 method. It was successful because
p ´ 1 “ 22 ˆ 3 ˆ 37 ˆ 41 ˆ 61 ˆ 223
Page 5
has all small divisors. In particular, it did NOT find the smaller prime divisor q “ 2028119
because q ´ 1 “ 2 ˆ 37 ˆ 27407, and it did not do enough iterations so that 27407 m.
As a simple example of the p ´ 1 method, let’s factor n “ 3811. As with the rho method,
we form a table:
k
2
3
4
5
6
2k! pmod 3811q gcdp2k! ´ 1, 3811q
4
1
64
1
1194
1
2172
1
3257
37
and 3811 “ 37 ˆ 103. We found 37 after 6 steps because 37 - 1 = 36, a divisor of 6!. Some
notes on this table: We did not calculate 2k! , but 2k! pmod nq. Also, one can calculate 2pk`1q!
by using the formula 2pk`1q! “ p2k! qk`1 , using the binary squaring algorithm. That is, once
we know 25! ” 2172 pmod 3811q, we calculate 26! pmod 3811q by calculating instead, 21726
pmod 3811q.
In real life, back in the late 70’s, the p ´ 1 method was used to show that 1053 ´ 1 is
divisible by p “ 1325815267337711173. In fact, this prime was found fairly quickly because
p ´ 1 “ 22 ˆ 32 ˆ 11 ˆ 53 ˆ 1279 ˆ 1553 ˆ 3557 ˆ 8941,
which has all of its prime divisors less than 10,000.
Page 6