Download Continued fraction factorization Heikki Muhli Sakari

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Addition wikipedia , lookup

Approximations of π wikipedia , lookup

Vincent's theorem wikipedia , lookup

Horner's method wikipedia , lookup

Arithmetic wikipedia , lookup

List of prime numbers wikipedia , lookup

Collatz conjecture wikipedia , lookup

Quadratic reciprocity wikipedia , lookup

Factorization of polynomials over finite fields wikipedia , lookup

Continued fraction wikipedia , lookup

Proofs of Fermat's little theorem wikipedia , lookup

Elementary mathematics wikipedia , lookup

Transcript
Aalto University
School of Science
Continued fraction factorization
Number theory project
December 14, 2016
Heikki Muhli
Sakari Saarenpää
1
Background
Continued fractions provide powerful tools for solving problems in number theory. For
example, continued fractions can be used to write primes as a sum of two squares. This
method is fast and even a hundred digit primes can be written this way. Continued
fractions also provide an efficient way to recognize a rational number even when only the
first few digits of its decimal expansion are given.
Continued fraction is an expression of the form [1]
a0 +
1
a1 +
1
a2 + a
(1)
1
3 +···
For simplicity it is usually written as
[a0 ,a1 ,a2 ,...,an ].
(2)
A continued fraction is simple if all the ai are positive integers. A finite continued fraction
has the following expression
1
a1 +
,
(3)
a2 + . 1
.. + 1
am
where each am is a real number and am > 0 for all m ≥ 1.
Let us suppose that cn = [a0 , a1 , a2 , ..., an ] is defined for all n. We call cn the nth
convergent of the continued fraction. If the limit limn→∞ cn exists, then we say that the
infinite continued fraction
1
(4)
a0 +
a1 + a2 + 1 1
a3 +···
converges. For infinite continued fractions it can be proven [1] that if a0 ,a1 ,... is any
infinite sequence of positive integers, then the sequence cn = [a0 ,a1P
,...an ] converges. More
generally, if an is an arbitrary sequence of positive reals such that ∞
n=0 an diverges then
(cn ) converges.
Let us now prove that limn→∞ cn exists. For any m ≥ n, the number cn is a partial
convergent of [a0 ,....,am ]. The even convergens c2n form a strictly increasing sequence and
the odd convergent c2n+1 form a strictly decreasing sequence. The even convergents are
all less than or equal to c1 and the odd convergents are all greater than or equal to c0 .
Therefore the limits β0 = limn→∞ c2n and β1 = limn→∞ c2n+1 exist and β0 ≤ β1 . And
therefore from Stein’s Proposition 5.2.7 [1] we can write
|c2n − c2n−1 | ≤
1
→0
2n(2n − 1)
(5)
and β0 = β1 and therefore limn→∞ cn exists.
As an example we have the following continued fractions e = [2,1,2,1,1,4,1,1,6,1,8,..]
and π = [3,7,15,1,292,1,1,1,2,1,3,1,..]. The following procedure can be used for calculating
continued fractions. Let us consider a real number r. Let i be the integer part of r and
s = r − i be the fractional part of r. The continued fraction presentation of r is then
[i,a1 ,a2 ,..], where [a1 ,a2 ,...] is the continued fraction representation of 1s . To calculate the
presentation of r, we first need to write down the integer part of r. Then we subtract
the integer part from r. If the difference is 0, the algorithm can be stopped. Otherwise
1
we find the reciprocal of the difference and repeat. This process can be implemented by
using a computer and for example Euclidean algorithm, if the number is rational.
The continued fraction factorization method (CFRAC) is a very useful integer
factorization algorithm. It was first presented by D. H. Lehmer and R.E Powers [3] in
1931 and later on developed as a computer algorithm by Michael A. Morrison and John
Brillhart in 1975. It is based on a similar idea as Dixon’s factorization method [2].
2
The method using the P ’s
The first method to factorize a number using continued fractions is the method of P ’s [3].
If we want to expand the square root of N in a continued fraction, first for the general
form of the nth complete quotient we have
xn =
(Pn + N 1/2 )
, x0 = N 1/2 , bxn c = an
Qn
(6)
By using the relation Pn2 + Qn Qn−1 = N we have
−Qn Qn−1 ≡ Pn2 (mod N ).
(7)
If we write (−1)n Qn = Q∗n , we then obtain
Q∗n (Pn−1 Pn−3 Pn−5 ...Pr )2 ≡ (Pn Pn−2 Pn−4 ....Ps )2 (mod N ),
(8)
where r = 1,s = 2 or r = 2, s = 1, depending whether n is even or odd, respectively. To
prove this, we assume that the above relation is true for n − 1, that is
Q∗n−1 (Pn−2 Pn−4 Pn−6 ...Ps )2 ≡ (Pn−1 Pn−3 Pn−5 ....Pr )2 (mod N )
(9)
and then continue to show that it is true for n. Multiplying 7 by
(Pn−1 Pn−3 ...Pr )2 · (Pn−2 Pn−4 ....Ps )2
(10)
and dividing by 9, we get 8. We have shown above that 8 is true for n = 1,2, therefore
by induction it is true in general.
Two Q∗ ’s are said to be equivalent if their product is a square. Q∗i is equivalent to
∗
Qj if x2 Q∗i = y 2 Q∗j . From this equation we obtain by substituting n = i and n = j in 8,
and noting that i and j are of the same parity
(xPi+1 Pi+3 ...Pj−1 )2 − (yPi+2 Pi+4 ....Pj )2 ≡ 0 (mod N ).
(11)
Unless N divides either (xPi+1 Pi+3 ...Pj−1 ) ± (yPi+2 Pi+4 ....Pj ), we obtain a factor of N by
finding the greatest common divisor of N and one of these numbers. If the two equivalent
Q∗ ’s are near each other in the series of denominators, the factors of N will be disclosed
with a minimum of effort.
This method can also be extended, for example in a case in which the product of
more than two Q∗ ’s is a square. This involves a straight forward application of 8, and
the ease with which the method may be applied depends again on the relative position
of the Q∗ ’s and the parities of their subscripts. It is unnecessary to compute the actual
products of the P ’s involved, since these products can always be reduced modulo N .
2
3
The method using the A’s
In addition to the method using the P ’s introduced in the previous section, there is an
alternate method for continued fraction factorization. This method focuses on the nth
convergent of a continued fraction of a real number x. The nth convergent is given by [1]:
c n = a0 +
1
a1 +
1
= [a0 , a1 , a2 , . . . an ] =
a2 + 1 1
···+ a
n
An
,
Bn
(12)
where An and Bn are integers that are guaranteed to exist because clearly cn ∈ Q. The
nth convergent has the property
lim cn = x,
(13)
n→∞
as introduced in the first section.
The idea of this alternate method is to use the integers An in the numerator of
the nth convergent to factorize a large number N into its prime factors. The continued
fraction is calculated for the square root of N , just like in the method using the P ’s
introduced in the previous
section. We already saw that it is possible to write the nth
√
complete quotient of N as
√
Pn + N
,
(14)
xn =
Qn
where Pn and Qn ≥ 1 are integers and Qj | (N − Pj2 ) [3; 4].
We will also need another equality. It can be shown that
2
A2j−1 − N Bj−1
= (−1)j Qj =: Q∗j ,
(15)
where Aj−1 , Bj−1 and Qj are integers defined in the equations 12 and 14 [3; 4]. This
equation can be written as
Q∗j ≡ A2j−1 (mod N ).
(16)
If for some k and ` we have x2 Qk = y 2 Q` with integers x and y, we can write
(xAk−1 )2 − (yA`−1 )2 ≡ 0 (mod N ).
(17)
From this equation it is possible to find a factorization of N by the greatest common
divisor process unless N divides xAk−1 + yA`−1 or xAk−1 − yA`−1 (because (xAk−1 )2 −
(yA`−1 )2 = (xAk−1 + yA`−1 )(xAk−1 − yA`−1 )) [3]. More than two Q∗j ’s can be used: take
x2 Q∗k Q∗m = y 2 Q∗` and we get the equation
(xA2k−1 A2m−1 ) − (yA` )2 ≡ 0 (mod N )
(18)
instead. The Aj ’s follow the recursion relation for the numerator of a continued fraction [1;
3]
Aj = aj Aj−1 + Aj−2 ,
(19)
where A−1 = 1 and A−2 = 0 and aj is the final integer of the jth convergent of the
continued fraction 12. The Aj ’s can be reduced modulo N if necessary [3].
3
4
Comparison of methods
In general, the method of the A’s is more useful, since the calculations are simpler. If
two equivalent Q∗ ’s appear near each other, method of P ’s is more successful. Let us
now show that the ease of application is indeed the factor in choosing between the two
methods. Let us use the following lemma proven in [3]. If n is any integer, then
Pn + (−1)n An−1 An−2 ≡ 0 (mod N )
(20)
What is interesting is that the success of one method in a particular instance implies the
success of the other. Let us prove this for simplicity for the case of only two equivalent
Q∗ ’s. The result can easily be generalized. Let Q∗i and Q∗j be equivalent so that
x2 Q∗i = y 2 Q∗j
(21)
Let us suppose that the A method succeeds and that the P methods fails. This means
that N will then divide one of the following numbers:
(xPi+1 Pi+3 Pi+5 ...Pj−1 ) ± y(Pi+2 Pi+4 Pi+6 ...Pj ).
(22)
Substituting for each P its value in terms of the A’s, as given by the lemma above, we
get by simplifying
xAi−1 ± yAj−1 ≡ 0 (mod N ).
(23)
This implies a failure of the A method which is against the hypothesis at the start.
Therefore the P method must also be successful. By reversing the argument, it can also
be shown that the success of P method implies the success of the A method. The only
instance[3] of the success of one method and the failure of the other is the case in which
the A method succeeds, the P method fails, and a factor of N appears among the P ’s
and Q’s.
5
Computer implementation of a continued fraction
factorization method with MATLAB
We have implemented the previously introduced continued fraction factorization method
using the A’s with MATLAB software. The implemented function takes as its arguments
the number to be factorized N and an integer n that tells the function to calculate the
convergents of the continued fraction up to n − 1 such that we get the integers Aj with
j ∈ {0, . . . , n − 1} as seen in the equation 12.
With the integers Aj−1 we can solve the integers Qj from the equation 16. With
Qj solved, we search for a combination of Qj such that
Qi Qj = X 2
(24)
Qi Qj Qk = X 2
(25)
for some i,j ∈ {1, . . . ,n}, X ∈ Z or
with i,j,k ∈ {1, . . . ,n}. The reason why we do not search for squares or other form (for
example Qi Qj Qk Qm = X 2 etc.) is that going through every possible combination of
4
Qj takes a long time and our intention is to merely demonstrate the process for a few
examples where the solution is found with just two or three different Qj ’s.
Of course, an integer can only be a square if every one of its prime factors is a
square. This is why we first factorize Qj ’s into primes, which is a computationally much
easier task than directly factorizing
N . iWe attempt the factorization by trying to divide
h √
Qj with primes on the interval 2,2 N . If some of the factors of Qj are outside of this
interval, we simply ignore that Qj to keep the prime factorization relatively simple. There
is a smaller probability for Qj with a very large prime factor to form a square with other
Qi ’s anyway so we do not actually lose much by ignoring these Qj ’s.
Once we have factorized the Qj , we will have the exponents ei,j of the prime
factorization
e
e
Qj = `11,j `22,j . . . `emm,j ,
(26)
h √ i
where `1 , . . . , `m ∈ 2,2 N are all the prime numbers that exist on the interval. Then
we search for i and j such that [4]
(e1,i ,e2,i , . . . ,em,i ) + (e1,j ,e2,j , . . . ,em,j ) = (0,0, . . . ,0) (mod 2)
(27)
or
(e1,i ,e2,i , . . . ,em,i ) + (e1,j ,e2,j , . . . ,em,j ) + (e1,k ,e2,k , . . . ,em,k ) = (0,0, . . . ,0) (mod 2) (28)
and if we find such a combination, it means that we have found Qj such that either
equation 24 or 25 is satisfied for some X ∈ Z.
Next we take the Aj−1 corresponding to Q∗j in equation 16 and calculate an integer
Y from the equation
A2i−1 A2j−1 = Y 2
(29)
or, equivalently to the equation 25, from
A2i−1 A2j−1 A2k−1 = Y 2 .
(30)
We have integers X and Y either from the equations 24 and 29 or 25 and 30. Next we
reduce these integers to modulo N . If the algorithm was successful, we will find a factor
or factors of N by finding the greatest common divisor of X − Y (modulo N ) and N . This
can be done by using the greatest common divisor process as mentioned in the previous
section or just by using the built-in gcd function of MATLAB.
The function we have implemented uses mostly just the very basic programming
structures like for loops, while loops and if statements and basic mathematical functions
like square root so it could be quite easily converted to most other programming languages.
The downside of this is that the code is quite slow. What makes it even slower is the fact
that the standard floating point double-precision in MATLAB is only accurate up to
approximately 10−16 . This becomes a problem because small decimals become significant
in the calculation of a continued fraction and with this precision 14th convergents are often
already wrong and the error accumulates quickly from that point onwards. Because we
want to use convergents up to at least n = 25, we use the variable-precision arithmetic that
MATLAB provides in the form of the function called vpa. By default, vpa calculates values
to 32 significant digits [6] which is enough for calculating the 25 first convergents but more
significant digits makes the calculations significantly slower. Because MATLAB is meant
for quick numerical calculations, it is probably not the best platform for demonstrating
this factorization method.
5
We demonstrate the function we have implemented by calculating the prime
factorization of N = 13290059. The function gives a factor 4261. It can be checked that
this indeed divides N and it is a prime number. By dividing N with 4261 we get another
factor 3119 which is also a prime number. Our function has thus successfully factorized N .
We also tried to factorize this number with the standard floating point double-precision.
In this case the function calculated an incorrect continued fraction from a certain point
onwards and no factorization was found. The complete MATLAB code for the function
is provided as an attachment at the end of this report.
6
Discussion and conclusions
Continued fractions are an interesting and beautiful way of expressing numbers. They
can be used to factorize integers in an efficient way. This CRFAC method was first
programmed with a computer in 1975. It can be divided into two different methods, the
methods of A’s and P ’s. Interestingly, as shown on this project, the ease of application
is the only deciding factor between the two methods.
A computer program for method using the A’s was implemented by using
MATLAB. The program managed to successfully factorize several relatively large numbers
of which we have provided one example in this report. The method was implemented with
MATLAB which probably was not the most optimal software for this algorithm because
of the default floating point precision the software uses for quick numerical calculations.
Nevertheless, we were able to produce a demonstrative example of a computational
implementation of this factorization algorithm.
When it comes to the challenges of the CFRAC method, RSA is a widely used
public-key cryptosystem for secure data transmission and it is based on the practical
difficulty of factoring the product of two large prime numbers. This makes integer
factorization algorithms extremely interesting. By using the CFRAC algorithm, factoring
primes as large as the ones used by RSA is practically impossible. In the future, a working
quantum computer would offer a solution for this integer factorization problem. By using
Shor’s [5] algorithm, a quantum computer would be able to solve these problems in
polynomial time and therefore overcome CFRAC and other algorithms.
6
References
[1] Stein, W. Elementary Number Theory: Primes, Congruences, and Secrets. November
16, 2011.
[2] Dixon, J. D. (1981). ”Asymptotically fast factorization of integers”. Math. Comp. 36
(153): 255–260.
[3] Lehmer, D.H.; Powers, R.E. (1931). ”On Factoring Large Numbers”. Bulletin of the
American Mathematical Society. 37 (10): 770–776.
[4] Steuding, J. Factoring with continued fractions, the Pell equation, and
weighted mediants. http://siauliaims.su.lt/pdfai/2003/stesle-03.pdf, cited
on December 14, 2016.
[5] Shor, Peter W. (1997), ”Polynomial-Time Algorithms for Prime Factorization and
Discrete Logarithms on a Quantum Computer”, SIAM J. Comput., 26 (5): 1484–1509.
[6] MathWorks: variable-precision arithmetic. https://se.mathworks.com/help/
symbolic/vpa.html, cited on December 14, 2016.
7
CFRAC code for MATLAB
1
f u n c t i o n [ p o s s i b l e f a c t o r , a ] = c f r a c (N, n )
2
3
4
5
6
7
% N i s t he number t o be f a c t o r i z e d
% n−1 i s the i n d e x o f the f i n a l c o n v e r g e n t t h a t w i l l be
c a l c u l a t e d ( because
% t h e i n d e x s t a r t s from 0 but Matlab o n l y a c c e p t i n d i c e s l a r g e r
than 0 ) .
%
% Example : [ pf , a ] = c f r a c ( 1 3 2 9 0 0 5 9 , 2 5 )
8
9
10
11
12
13
14
15
16
17
x = s q r t ( vpa (N) ) ;
% nth complete q u o t i e n t . Note t h a t t h e r e
w i l l be an e r r o r
% from machine p r e c i s i o n when c a l c u l a t i n g s q u a r e
r o o t with
% Matlab so t h e r e can be some unexpected
behavior .
a = floor (x) ;
% I n t e g e r s a p p e a r i n g i n t he c o n t i n u e d f r a c t i o n
t = x − a;
AA = a ;
% Numerators o f p a r t i a l c o n v e r g e n t s
BB = 1 ;
% Denominators o f p a r t i a l c o n v e r g e n t s
pp = mod(AAˆ 2 ,N) ;
c = a;
% nth c o n v e r g e n t s
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
f o r i = 1 : n−1
a = [ a f l o o r (1/ t ) ] ;
t = 1/ t−f l o o r (1/ t ) ;
i f a ( end ) == 0
break ;
end
A = a ( end ) ;
B = 1;
for j = 1: i
Aprev = A;
A = A∗a ( end−j ) + B ;
B = Aprev ;
end
p = mod(Aˆ 2 ,N) ;
pp = [ pp p ] ;
c = [ c A/B ] ;
AA = [AA A ] ;
BB = [BB B ] ;
end
38
39
40
41
Qmod = [ ] ;
f o r j = 1 : l e n g t h (AA)
Qmod = [ Qmod mod(( −1) ˆ ( j ) ∗AA( j ) ˆ 2 ,N) ] ;
8
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
end
% F a c t o r i z e Qmod i n t o primes
p r i m e v a l u e s = primes (2∗Nˆ ( 1 / 2 ) ) ;
e l i s t = z e r o s ( l e n g t h (Qmod) , l e n g t h ( p r i m e v a l u e s ) ) ;
i = 1;
f o r Q = Qmod
j = 1;
w h i l e (Q ˜= 1 )
i f j > length ( primevalues )
break ;
end
e = 1;
w h i l e (Q/ p r i m e v a l u e s ( j ) ˆ e == f l o o r (Q/ p r i m e v a l u e s ( j ) ˆ e ) )
e = e +1;
end
Q = Q/ p r i m e v a l u e s ( j ) ˆ ( e −1) ;
e l i s t ( i , j ) = e −1;
j = j +1;
end
i f (Q ˜= 1 )
e l i s t ( i , : ) = zeros (1 , length ( primevalues ) ) ;
end
i = i +1;
end
% Now t h e prime f a c t o r i z a t i o n o f Qmod( j ) i s g i v e n by
% p r i m e v a l u e s . ˆ e l i s t ( j , : ) i f the f a c t o r i z a t i o n i s n i c e enough .
I f the
% f a c t o r i z a t i o n has f a c t o r s l a r g e r than 2∗Nˆ ( 1 / 2 ) , we w i l l
s i m p l y omit i t
% t o make the computation t i m e s more r e a s o n a b l e .
% We then s e a r c h f o r such Qmod’ s t h a t t h e i r product i s a square ,
that is ,
% a l l t h e i r prime f a c t o r s a r e s q u a r e s . So , sum o f the rows o f
e l i s t should
% be e q u i v a l e n t t o 0 mod 2 .
% We w i l l o n l y t r y t o s e a r c h f o r sums with up t o t h r e e rows
because
% s e a r c h i n g f o r more p o s s i b l e p e r m u t a t i o n s t a k e s much l o n g e r .
X2 = 0 ;
s = 1;
f o r i = 1 : l e n g t h (Qmod)
f o r j = ( i +1) : l e n g t h (Qmod)
i f ( ˜ any (mod( e l i s t ( i , : ) + e l i s t ( j , : ) , 2 ) ) && any ( e l i s t ( i
, : ) ) && any ( e l i s t ( j , : ) ) )
X2 = Qmod( i ) ∗Qmod( j ) ;
A2 = AA( i ) ˆ2∗AA( j ) ˆ 2 ;
Y f u l l = s q r t (A2) ;
Y = mod( Y f u l l ,N) ;
9
X = mod( s q r t (X2) ,N) ;
p o s s i b l e f a c t o r ( s ) = gcd (X−Y,N) ;
s = s +1;
84
85
86
else
87
f o r k = ( j +1) : l e n g t h (Qmod)
i f ( ˜ any (mod( e l i s t ( i , : ) + e l i s t ( j , : ) + e l i s t ( k
, : ) , 2 ) ) && any ( e l i s t ( i , : ) ) && any ( e l i s t ( j , : ) )
&& any ( e l i s t ( k , : ) ) )
X2 = Qmod( i ) ∗Qmod( j ) ∗Qmod( k ) ;
A2 = AA( i ) ˆ2∗AA( j ) ˆ2∗AA( k ) ˆ 2 ;
Y f u l l = s q r t (A2) ;
Y = mod( Y f u l l ,N) ;
X = s q r t (X2) ;
p o s s i b l e f a c t o r ( s ) = gcd (X−Y,N) ;
s = s +1;
break ;
end
end
88
89
90
91
92
93
94
95
96
97
98
99
end
100
end
101
102
103
end
end
10