Download Continued fraction factorization Heikki Muhli Sakari

Aalto University School of Science Continued fraction factorization Number theory project December 14, 2016 Heikki Muhli Sakari Saarenpää 1 Background Continued fractions provide powerful tools for solving problems in number theory. For example, continued fractions can be used to write primes as a sum of two squares. This method is fast and even a hundred digit primes can be written this way. Continued fractions also provide an efficient way to recognize a rational number even when only the first few digits of its decimal expansion are given. Continued fraction is an expression of the form [1] a0 + 1 a1 + 1 a2 + a (1) 1 3 +··· For simplicity it is usually written as [a0 ,a1 ,a2 ,...,an ]. (2) A continued fraction is simple if all the ai are positive integers. A finite continued fraction has the following expression 1 a1 + , (3) a2 + . 1 .. + 1 am where each am is a real number and am > 0 for all m ≥ 1. Let us suppose that cn = [a0 , a1 , a2 , ..., an ] is defined for all n. We call cn the nth convergent of the continued fraction. If the limit limn→∞ cn exists, then we say that the infinite continued fraction 1 (4) a0 + a1 + a2 + 1 1 a3 +··· converges. For infinite continued fractions it can be proven [1] that if a0 ,a1 ,... is any infinite sequence of positive integers, then the sequence cn = [a0 ,a1P ,...an ] converges. More generally, if an is an arbitrary sequence of positive reals such that ∞ n=0 an diverges then (cn ) converges. Let us now prove that limn→∞ cn exists. For any m ≥ n, the number cn is a partial convergent of [a0 ,....,am ]. The even convergens c2n form a strictly increasing sequence and the odd convergent c2n+1 form a strictly decreasing sequence. The even convergents are all less than or equal to c1 and the odd convergents are all greater than or equal to c0 . Therefore the limits β0 = limn→∞ c2n and β1 = limn→∞ c2n+1 exist and β0 ≤ β1 . And therefore from Stein’s Proposition 5.2.7 [1] we can write |c2n − c2n−1 | ≤ 1 →0 2n(2n − 1) (5) and β0 = β1 and therefore limn→∞ cn exists. As an example we have the following continued fractions e = [2,1,2,1,1,4,1,1,6,1,8,..] and π = [3,7,15,1,292,1,1,1,2,1,3,1,..]. The following procedure can be used for calculating continued fractions. Let us consider a real number r. Let i be the integer part of r and s = r − i be the fractional part of r. The continued fraction presentation of r is then [i,a1 ,a2 ,..], where [a1 ,a2 ,...] is the continued fraction representation of 1s . To calculate the presentation of r, we first need to write down the integer part of r. Then we subtract the integer part from r. If the difference is 0, the algorithm can be stopped. Otherwise 1 we find the reciprocal of the difference and repeat. This process can be implemented by using a computer and for example Euclidean algorithm, if the number is rational. The continued fraction factorization method (CFRAC) is a very useful integer factorization algorithm. It was first presented by D. H. Lehmer and R.E Powers [3] in 1931 and later on developed as a computer algorithm by Michael A. Morrison and John Brillhart in 1975. It is based on a similar idea as Dixon’s factorization method [2]. 2 The method using the P ’s The first method to factorize a number using continued fractions is the method of P ’s [3]. If we want to expand the square root of N in a continued fraction, first for the general form of the nth complete quotient we have xn = (Pn + N 1/2 ) , x0 = N 1/2 , bxn c = an Qn (6) By using the relation Pn2 + Qn Qn−1 = N we have −Qn Qn−1 ≡ Pn2 (mod N ). (7) If we write (−1)n Qn = Q∗n , we then obtain Q∗n (Pn−1 Pn−3 Pn−5 ...Pr )2 ≡ (Pn Pn−2 Pn−4 ....Ps )2 (mod N ), (8) where r = 1,s = 2 or r = 2, s = 1, depending whether n is even or odd, respectively. To prove this, we assume that the above relation is true for n − 1, that is Q∗n−1 (Pn−2 Pn−4 Pn−6 ...Ps )2 ≡ (Pn−1 Pn−3 Pn−5 ....Pr )2 (mod N ) (9) and then continue to show that it is true for n. Multiplying 7 by (Pn−1 Pn−3 ...Pr )2 · (Pn−2 Pn−4 ....Ps )2 (10) and dividing by 9, we get 8. We have shown above that 8 is true for n = 1,2, therefore by induction it is true in general. Two Q∗ ’s are said to be equivalent if their product is a square. Q∗i is equivalent to ∗ Qj if x2 Q∗i = y 2 Q∗j . From this equation we obtain by substituting n = i and n = j in 8, and noting that i and j are of the same parity (xPi+1 Pi+3 ...Pj−1 )2 − (yPi+2 Pi+4 ....Pj )2 ≡ 0 (mod N ). (11) Unless N divides either (xPi+1 Pi+3 ...Pj−1 ) ± (yPi+2 Pi+4 ....Pj ), we obtain a factor of N by finding the greatest common divisor of N and one of these numbers. If the two equivalent Q∗ ’s are near each other in the series of denominators, the factors of N will be disclosed with a minimum of effort. This method can also be extended, for example in a case in which the product of more than two Q∗ ’s is a square. This involves a straight forward application of 8, and the ease with which the method may be applied depends again on the relative position of the Q∗ ’s and the parities of their subscripts. It is unnecessary to compute the actual products of the P ’s involved, since these products can always be reduced modulo N . 2 3 The method using the A’s In addition to the method using the P ’s introduced in the previous section, there is an alternate method for continued fraction factorization. This method focuses on the nth convergent of a continued fraction of a real number x. The nth convergent is given by [1]: c n = a0 + 1 a1 + 1 = [a0 , a1 , a2 , . . . an ] = a2 + 1 1 ···+ a n An , Bn (12) where An and Bn are integers that are guaranteed to exist because clearly cn ∈ Q. The nth convergent has the property lim cn = x, (13) n→∞ as introduced in the first section. The idea of this alternate method is to use the integers An in the numerator of the nth convergent to factorize a large number N into its prime factors. The continued fraction is calculated for the square root of N , just like in the method using the P ’s introduced in the previous section. We already saw that it is possible to write the nth √ complete quotient of N as √ Pn + N , (14) xn = Qn where Pn and Qn ≥ 1 are integers and Qj | (N − Pj2 ) [3; 4]. We will also need another equality. It can be shown that 2 A2j−1 − N Bj−1 = (−1)j Qj =: Q∗j , (15) where Aj−1 , Bj−1 and Qj are integers defined in the equations 12 and 14 [3; 4]. This equation can be written as Q∗j ≡ A2j−1 (mod N ). (16) If for some k and ` we have x2 Qk = y 2 Q` with integers x and y, we can write (xAk−1 )2 − (yA`−1 )2 ≡ 0 (mod N ). (17) From this equation it is possible to find a factorization of N by the greatest common divisor process unless N divides xAk−1 + yA`−1 or xAk−1 − yA`−1 (because (xAk−1 )2 − (yA`−1 )2 = (xAk−1 + yA`−1 )(xAk−1 − yA`−1 )) [3]. More than two Q∗j ’s can be used: take x2 Q∗k Q∗m = y 2 Q∗` and we get the equation (xA2k−1 A2m−1 ) − (yA` )2 ≡ 0 (mod N ) (18) instead. The Aj ’s follow the recursion relation for the numerator of a continued fraction [1; 3] Aj = aj Aj−1 + Aj−2 , (19) where A−1 = 1 and A−2 = 0 and aj is the final integer of the jth convergent of the continued fraction 12. The Aj ’s can be reduced modulo N if necessary [3]. 3 4 Comparison of methods In general, the method of the A’s is more useful, since the calculations are simpler. If two equivalent Q∗ ’s appear near each other, method of P ’s is more successful. Let us now show that the ease of application is indeed the factor in choosing between the two methods. Let us use the following lemma proven in [3]. If n is any integer, then Pn + (−1)n An−1 An−2 ≡ 0 (mod N ) (20) What is interesting is that the success of one method in a particular instance implies the success of the other. Let us prove this for simplicity for the case of only two equivalent Q∗ ’s. The result can easily be generalized. Let Q∗i and Q∗j be equivalent so that x2 Q∗i = y 2 Q∗j (21) Let us suppose that the A method succeeds and that the P methods fails. This means that N will then divide one of the following numbers: (xPi+1 Pi+3 Pi+5 ...Pj−1 ) ± y(Pi+2 Pi+4 Pi+6 ...Pj ). (22) Substituting for each P its value in terms of the A’s, as given by the lemma above, we get by simplifying xAi−1 ± yAj−1 ≡ 0 (mod N ). (23) This implies a failure of the A method which is against the hypothesis at the start. Therefore the P method must also be successful. By reversing the argument, it can also be shown that the success of P method implies the success of the A method. The only instance[3] of the success of one method and the failure of the other is the case in which the A method succeeds, the P method fails, and a factor of N appears among the P ’s and Q’s. 5 Computer implementation of a continued fraction factorization method with MATLAB We have implemented the previously introduced continued fraction factorization method using the A’s with MATLAB software. The implemented function takes as its arguments the number to be factorized N and an integer n that tells the function to calculate the convergents of the continued fraction up to n − 1 such that we get the integers Aj with j ∈ {0, . . . , n − 1} as seen in the equation 12. With the integers Aj−1 we can solve the integers Qj from the equation 16. With Qj solved, we search for a combination of Qj such that Qi Qj = X 2 (24) Qi Qj Qk = X 2 (25) for some i,j ∈ {1, . . . ,n}, X ∈ Z or with i,j,k ∈ {1, . . . ,n}. The reason why we do not search for squares or other form (for example Qi Qj Qk Qm = X 2 etc.) is that going through every possible combination of 4 Qj takes a long time and our intention is to merely demonstrate the process for a few examples where the solution is found with just two or three different Qj ’s. Of course, an integer can only be a square if every one of its prime factors is a square. This is why we first factorize Qj ’s into primes, which is a computationally much easier task than directly factorizing N . iWe attempt the factorization by trying to divide h √ Qj with primes on the interval 2,2 N . If some of the factors of Qj are outside of this interval, we simply ignore that Qj to keep the prime factorization relatively simple. There is a smaller probability for Qj with a very large prime factor to form a square with other Qi ’s anyway so we do not actually lose much by ignoring these Qj ’s. Once we have factorized the Qj , we will have the exponents ei,j of the prime factorization e e Qj = `11,j `22,j . . . `emm,j , (26) h √ i where `1 , . . . , `m ∈ 2,2 N are all the prime numbers that exist on the interval. Then we search for i and j such that [4] (e1,i ,e2,i , . . . ,em,i ) + (e1,j ,e2,j , . . . ,em,j ) = (0,0, . . . ,0) (mod 2) (27) or (e1,i ,e2,i , . . . ,em,i ) + (e1,j ,e2,j , . . . ,em,j ) + (e1,k ,e2,k , . . . ,em,k ) = (0,0, . . . ,0) (mod 2) (28) and if we find such a combination, it means that we have found Qj such that either equation 24 or 25 is satisfied for some X ∈ Z. Next we take the Aj−1 corresponding to Q∗j in equation 16 and calculate an integer Y from the equation A2i−1 A2j−1 = Y 2 (29) or, equivalently to the equation 25, from A2i−1 A2j−1 A2k−1 = Y 2 . (30) We have integers X and Y either from the equations 24 and 29 or 25 and 30. Next we reduce these integers to modulo N . If the algorithm was successful, we will find a factor or factors of N by finding the greatest common divisor of X − Y (modulo N ) and N . This can be done by using the greatest common divisor process as mentioned in the previous section or just by using the built-in gcd function of MATLAB. The function we have implemented uses mostly just the very basic programming structures like for loops, while loops and if statements and basic mathematical functions like square root so it could be quite easily converted to most other programming languages. The downside of this is that the code is quite slow. What makes it even slower is the fact that the standard floating point double-precision in MATLAB is only accurate up to approximately 10−16 . This becomes a problem because small decimals become significant in the calculation of a continued fraction and with this precision 14th convergents are often already wrong and the error accumulates quickly from that point onwards. Because we want to use convergents up to at least n = 25, we use the variable-precision arithmetic that MATLAB provides in the form of the function called vpa. By default, vpa calculates values to 32 significant digits [6] which is enough for calculating the 25 first convergents but more significant digits makes the calculations significantly slower. Because MATLAB is meant for quick numerical calculations, it is probably not the best platform for demonstrating this factorization method. 5 We demonstrate the function we have implemented by calculating the prime factorization of N = 13290059. The function gives a factor 4261. It can be checked that this indeed divides N and it is a prime number. By dividing N with 4261 we get another factor 3119 which is also a prime number. Our function has thus successfully factorized N . We also tried to factorize this number with the standard floating point double-precision. In this case the function calculated an incorrect continued fraction from a certain point onwards and no factorization was found. The complete MATLAB code for the function is provided as an attachment at the end of this report. 6 Discussion and conclusions Continued fractions are an interesting and beautiful way of expressing numbers. They can be used to factorize integers in an efficient way. This CRFAC method was first programmed with a computer in 1975. It can be divided into two different methods, the methods of A’s and P ’s. Interestingly, as shown on this project, the ease of application is the only deciding factor between the two methods. A computer program for method using the A’s was implemented by using MATLAB. The program managed to successfully factorize several relatively large numbers of which we have provided one example in this report. The method was implemented with MATLAB which probably was not the most optimal software for this algorithm because of the default floating point precision the software uses for quick numerical calculations. Nevertheless, we were able to produce a demonstrative example of a computational implementation of this factorization algorithm. When it comes to the challenges of the CFRAC method, RSA is a widely used public-key cryptosystem for secure data transmission and it is based on the practical difficulty of factoring the product of two large prime numbers. This makes integer factorization algorithms extremely interesting. By using the CFRAC algorithm, factoring primes as large as the ones used by RSA is practically impossible. In the future, a working quantum computer would offer a solution for this integer factorization problem. By using Shor’s [5] algorithm, a quantum computer would be able to solve these problems in polynomial time and therefore overcome CFRAC and other algorithms. 6 References [1] Stein, W. Elementary Number Theory: Primes, Congruences, and Secrets. November 16, 2011. [2] Dixon, J. D. (1981). ”Asymptotically fast factorization of integers”. Math. Comp. 36 (153): 255–260. [3] Lehmer, D.H.; Powers, R.E. (1931). ”On Factoring Large Numbers”. Bulletin of the American Mathematical Society. 37 (10): 770–776. [4] Steuding, J. Factoring with continued fractions, the Pell equation, and weighted mediants. http://siauliaims.su.lt/pdfai/2003/stesle-03.pdf, cited on December 14, 2016. [5] Shor, Peter W. (1997), ”Polynomial-Time Algorithms for Prime Factorization and Discrete Logarithms on a Quantum Computer”, SIAM J. Comput., 26 (5): 1484–1509. [6] MathWorks: variable-precision arithmetic. https://se.mathworks.com/help/ symbolic/vpa.html, cited on December 14, 2016. 7 CFRAC code for MATLAB 1 f u n c t i o n [ p o s s i b l e f a c t o r , a ] = c f r a c (N, n ) 2 3 4 5 6 7 % N i s t he number t o be f a c t o r i z e d % n−1 i s the i n d e x o f the f i n a l c o n v e r g e n t t h a t w i l l be c a l c u l a t e d ( because % t h e i n d e x s t a r t s from 0 but Matlab o n l y a c c e p t i n d i c e s l a r g e r than 0 ) . % % Example : [ pf , a ] = c f r a c ( 1 3 2 9 0 0 5 9 , 2 5 ) 8 9 10 11 12 13 14 15 16 17 x = s q r t ( vpa (N) ) ; % nth complete q u o t i e n t . Note t h a t t h e r e w i l l be an e r r o r % from machine p r e c i s i o n when c a l c u l a t i n g s q u a r e r o o t with % Matlab so t h e r e can be some unexpected behavior . a = floor (x) ; % I n t e g e r s a p p e a r i n g i n t he c o n t i n u e d f r a c t i o n t = x − a; AA = a ; % Numerators o f p a r t i a l c o n v e r g e n t s BB = 1 ; % Denominators o f p a r t i a l c o n v e r g e n t s pp = mod(AAˆ 2 ,N) ; c = a; % nth c o n v e r g e n t s 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 f o r i = 1 : n−1 a = [ a f l o o r (1/ t ) ] ; t = 1/ t−f l o o r (1/ t ) ; i f a ( end ) == 0 break ; end A = a ( end ) ; B = 1; for j = 1: i Aprev = A; A = A∗a ( end−j ) + B ; B = Aprev ; end p = mod(Aˆ 2 ,N) ; pp = [ pp p ] ; c = [ c A/B ] ; AA = [AA A ] ; BB = [BB B ] ; end 38 39 40 41 Qmod = [ ] ; f o r j = 1 : l e n g t h (AA) Qmod = [ Qmod mod(( −1) ˆ ( j ) ∗AA( j ) ˆ 2 ,N) ] ; 8 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 end % F a c t o r i z e Qmod i n t o primes p r i m e v a l u e s = primes (2∗Nˆ ( 1 / 2 ) ) ; e l i s t = z e r o s ( l e n g t h (Qmod) , l e n g t h ( p r i m e v a l u e s ) ) ; i = 1; f o r Q = Qmod j = 1; w h i l e (Q ˜= 1 ) i f j > length ( primevalues ) break ; end e = 1; w h i l e (Q/ p r i m e v a l u e s ( j ) ˆ e == f l o o r (Q/ p r i m e v a l u e s ( j ) ˆ e ) ) e = e +1; end Q = Q/ p r i m e v a l u e s ( j ) ˆ ( e −1) ; e l i s t ( i , j ) = e −1; j = j +1; end i f (Q ˜= 1 ) e l i s t ( i , : ) = zeros (1 , length ( primevalues ) ) ; end i = i +1; end % Now t h e prime f a c t o r i z a t i o n o f Qmod( j ) i s g i v e n by % p r i m e v a l u e s . ˆ e l i s t ( j , : ) i f the f a c t o r i z a t i o n i s n i c e enough . I f the % f a c t o r i z a t i o n has f a c t o r s l a r g e r than 2∗Nˆ ( 1 / 2 ) , we w i l l s i m p l y omit i t % t o make the computation t i m e s more r e a s o n a b l e . % We then s e a r c h f o r such Qmod’ s t h a t t h e i r product i s a square , that is , % a l l t h e i r prime f a c t o r s a r e s q u a r e s . So , sum o f the rows o f e l i s t should % be e q u i v a l e n t t o 0 mod 2 . % We w i l l o n l y t r y t o s e a r c h f o r sums with up t o t h r e e rows because % s e a r c h i n g f o r more p o s s i b l e p e r m u t a t i o n s t a k e s much l o n g e r . X2 = 0 ; s = 1; f o r i = 1 : l e n g t h (Qmod) f o r j = ( i +1) : l e n g t h (Qmod) i f ( ˜ any (mod( e l i s t ( i , : ) + e l i s t ( j , : ) , 2 ) ) && any ( e l i s t ( i , : ) ) && any ( e l i s t ( j , : ) ) ) X2 = Qmod( i ) ∗Qmod( j ) ; A2 = AA( i ) ˆ2∗AA( j ) ˆ 2 ; Y f u l l = s q r t (A2) ; Y = mod( Y f u l l ,N) ; 9 X = mod( s q r t (X2) ,N) ; p o s s i b l e f a c t o r ( s ) = gcd (X−Y,N) ; s = s +1; 84 85 86 else 87 f o r k = ( j +1) : l e n g t h (Qmod) i f ( ˜ any (mod( e l i s t ( i , : ) + e l i s t ( j , : ) + e l i s t ( k , : ) , 2 ) ) && any ( e l i s t ( i , : ) ) && any ( e l i s t ( j , : ) ) && any ( e l i s t ( k , : ) ) ) X2 = Qmod( i ) ∗Qmod( j ) ∗Qmod( k ) ; A2 = AA( i ) ˆ2∗AA( j ) ˆ2∗AA( k ) ˆ 2 ; Y f u l l = s q r t (A2) ; Y = mod( Y f u l l ,N) ; X = s q r t (X2) ; p o s s i b l e f a c t o r ( s ) = gcd (X−Y,N) ; s = s +1; break ; end end 88 89 90 91 92 93 94 95 96 97 98 99 end 100 end 101 102 103 end end 10

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Continued fraction factorization Heikki Muhli Sakari