Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Infinitesimal wikipedia , lookup
Large numbers wikipedia , lookup
List of prime numbers wikipedia , lookup
Location arithmetic wikipedia , lookup
Positional notation wikipedia , lookup
Approximations of π wikipedia , lookup
Elementary arithmetic wikipedia , lookup
Elementary mathematics wikipedia , lookup
Proofs of Fermat's little theorem wikipedia , lookup
Factorization of polynomials over finite fields wikipedia , lookup
Reverse Factorization and Comparison of Factorization Algorithms in attack to RSA Sadi Evren SEKER Dept. of Business Administration. Istanbul Medeniyet University [email protected] ABSTRACT Factorization algorithms have a major role in the computer security and cryptography. Most of the widely used cryptographic algorithms, like RSA, are built on the mathematical difficulty of factorization for big prime numbers. This research, proposes a new approach to the factorization by using two new enhancements. The new approach is also compared with six different factorization algorithms and evaluated the performance on a big data environment. The algorithms covered are elliptic curve method, quadratic sieve, Fermat’s method, trial division and Pollard rho methods. Success rates are compared over a million of integer numbers with different difficulties. We have implemented our own algorithm for random number generation, which is also explained in the paper. We also empirically show that the new approach has an advantage on the factorization attack to RSA. Cihan MERT Electrical Engineering Dept. The University of Texas at Dallas [email protected] details of the factorization algorithms we have implemented in this study. The experiments section will go into the details of the big number of integers their properties after generation and evaluation of the algorithms. 2. PROBLEM STATEMENT A stepwise approach to the study can be viewed as in the Figure 1. Keywords Factorization, Cryptography, Benchmarking Acknowledgement Work of Sadi Evren SEKER is supported by Istanbul University, research projects department under project number YADOP-27254 1. INTRODUCTION This study can be viewed as three major steps. In the first layer, we have generated big integers with a new approach on the generation. After the generation, the factorization algorithms including the new approach are executed. Finally, on the last step the performance of the algorithms are evaluated. In this paper, the problem will be defined and an overview of the problem will be demonstrated in the problem statement section. The related work section will cover a brief literature review about the contemporary studies on the factorization algorithms. The background chapter will briefly describe the Figure 1. Overview of Study In order to simulate the RSA prime number factorization problem, we have only concentrated on the semi-prime numbers. The random generator is designed to generate the semiprime numbers. In order to make the time performance more explicit we have generated huge number of semi-prime num- bers and stored them in a database. After storing, the factorization algorithms are executed on those numbers. Finally each algorithm is evaluated in the time performance. 3. BACKGROUND From the early times, the factorization of composite numbers has been an interesting area of studying and there are some algorithms carried on like Sieve of Eratosthenes ( 276 – 194 BC). Also the by the spreading usage of modern cryptographic systems which some are built on the difficulty of factoring, like RSA[1], the factorization problem has been a studying area. Initially factoring started with dividing a number by larger and larger primes until you had the factorization. This trial division was not improved until Fermat’s method in which the factorization of the difference of two squares is used. While Fermat's method is much faster than trial division, when it comes to the real world of factoring, for example for factoring several hundred digits long RSA modulus, the purely iterative Fermat’s method is too slow. This led the development of several other methods, such as a pair of probabilistic methods by Pollard in the mid 70's, the p − 1 method and the ρ method, the Elliptic Curve Method discovered by H. Lenstra in 1987 . However, the fastest algorithms such as the Number Field Sieve (and its variants), the Quadratic Sieve (and it variants), and Continued Fraction Method utilize the same trick as Fermat. The remainder of this paper will briefly discuss some of the above methods and focus on reverse factorization method, a new approach. 3.1. Factorization by Trial Division Trial method is a brute-force method of finding a divisor of an integer N by simply trying if N is divisible by 2,3,5,7,11,13,17,…, i.e., all primes which are less than or equal to √𝑁 in succession, until a divisor is reached. To partially or completely factor N, Trial division is an effective and simple method. It is reasonable to use trial division method as a factoring method when N is not too large. 3.2. Fermat Factorization Fermat's factorization method [2] looks for the representation of an odd integer N as the difference of two squares N = a2 − b2 . Then N = (a − b)(a + b) and N is factored. To factor any number N, first calculate √N. Then compute a2 − N starting with a, the first integer greater than √N and continue until reaching a square b2 . Since a2 − N = b2 , N = a2 − b2 . So N is factorized into N = (a − b)(a + b) .If the only factors found are N and 1, then N is a prime number. If N is not prime, use the same algorithm for each factor. Fermat's method works well when the number is factorized into two terms of approximately equal size. It works poorly when the factors are of very different sizes. 3.3. Quadratic Sieve To factorize a number n, quadratic sieve method [3] attempts to find two numbers x and y such that 𝑥 ≢ ±𝑦 (𝑚𝑜𝑑 𝑛) and 𝑥2 ≡ 𝑦2 (𝑚𝑜𝑑 𝑛). If two such numbers are found, this implies that (x − y)(x + y) ≡ 0 (mod n). Then, x − y must have non-trivial factors in common with n. To achieve this, a common strategy for finding such x and y is the following. Choose a smoothness bound B. The number π(B),which denotes the number of prime numbers less than B, will control both the number of vectors needed and the length of the vectors. Then use sieving to locate π(B) + 1 numbers 𝑥𝑖 such that 𝑦𝑖 ≡ (𝑥𝑖2 𝑚𝑜𝑑 𝑛) is B-smooth. Factor the 𝑦𝑖 and generate exponent vectors mod 2 for each one. Find a subset of these vectors which add to the zero vector. Multiply the corresponding 𝑥𝑖 together naming the result mod n: x and the 𝑦𝑖 together which yields a B-smooth square 𝑦 2 . Next, obtained equality 𝑥2 ≡ 𝑦2 (𝑚𝑜𝑑 𝑛) gives two square roots of (𝑥 2 𝑚𝑜𝑑 𝑛), one by taking the square root in the integers of 𝑦 2 namely 𝑦, and the other the a computed in previous step. Having desired identity(x − y)(x + y) ≡ 0(mod n), compute the 𝐺𝐶𝐷(𝑥 − 𝑦, 𝑛). This gives a factor. If the factor is trivial, try again with a different a or linear dependency. 3.4. Pollard Rho Pollard’s rho method [4] is based on a combination of two ideas on Floyd's cycle-finding algorithm and birthday paradox that are also useful for various other factoring methods. Let N be a number that is neither a perfect power nor a prime and p the smallest prime factor of N. Generate sequence of numbers 𝑥0 , 𝑥1 , 𝑥2 , … from 𝑍𝑁 uniformly, independently at random then after at most p + 1 such pickings for the first time, there are two numbers 𝑥𝑖 and 𝑥𝑠 with i < s such that 𝑥𝑖 ≡ 𝑥𝑠 (𝑚𝑜𝑑 𝑝). Since N is not a perfect power, there is another prime factor q > p of N. Since the numbers 𝑥𝑖 and 𝑥𝑠 are randomly chosen from 𝑍𝑁 , by the Chinese remaindering theorem, 𝑥𝑖 ≢ 𝑥𝑠 (𝑚𝑜𝑑 𝑞) with probability 1 − 1/𝑞 even under the condition that 𝑥𝑖 ≡ 𝑥𝑠 (𝑚𝑜𝑑 𝑝). Therefore, 𝑔𝑐𝑑(𝑥𝑖 − 𝑥𝑠 , 𝑁) is a nontrivial factor of N with probability at least 1 − 1/𝑞. Since the 𝑥𝑖 𝑚𝑜𝑑 𝑝 behave more or less as random integers in 0,1, … , 𝑝 − 1 , by computing 𝑔𝑐𝑑(𝑥𝑖 − 𝑥𝑗 , 𝑁), for 𝑖 ≠ 𝑗 , the factorization of N after about 𝑐 √𝑝 elements of the sequence can be computed, for some small constant c. 2 This suggests that approximately (𝑐√𝑝) /2 pairs 𝑥𝑖 , 𝑥𝑗 have to be considered. However, this can easily be avoided by only computing 𝑔𝑐𝑑(𝑥𝑖 − 𝑥2𝑖 , 𝑁) for 𝑖 = 0,1, … , i.e., by generating two copies of the sequence, one at the regular speed and one at the double speed. This can be expected to result in a factorization of N after approximately 2√𝑝 gcd computations. If this GCD ever comes to N, then the algorithm terminates with failure, since this means 𝑥𝑖 = 𝑥2𝑖 and therefore, by Floyd's cycle-finding algorithm, the sequence has cycled and continuing any further would only be repeating previous work. 4. Semi-prime Factorization in RSA This study focus on the fast and efficient factorization for the semi-prime numbers. The semi-prime numbers are considered as the multiplication of two prime numbers, say p and q. In some sources the semi-prime numbers are also named as pq numbers for this reason. The advantage of factorizing the semi-prime numbers in RSA crypto system is the two prime factors of semi-prime numbers should be in equal digists or almost in equal digits. The reason is, if the number of digits of one prime of the semi-prime number is smaller than the other, the system woul have a weakness. The weakness can be explained like this. The RSA system is built on the time complexity of factorizing the semi-prime number into two factors. The time complexity increases by the number of digits. For example the time required to factorize a 20 digit number is muh more higher than the time required for factorizing 19 digit number. But if one of the factors of the high digit number is so small. Let’s give an example of extreme case with one digit prime like 2,3,5 or 7. Than factorizing the number would be much more easier. And finding any factor of the number would make it even easier to find the second factor. So, in most of the cases, RSA uses the two prime numbers with equal digits to generate a semi-prime number. The novel approach proposed in this study, considers this as a vulnerability and and proofs that, using the same digit primes to generate a semi-prime is also makes easier to get factorization with the novel method explained in this paper. 5. A Novel Approach to Semi-prime Factorization In the new approach, we see the problem as a search problem, where the factors p and q of a semi-prime number sp are smaller than the 𝑝, 𝑞 < 2√𝑠𝑝 we propose to implement a sieving approach, which increaes the speed of searching by eliminating some of the possibilities in each check. On the other hande, we propose to keep a factor tree for fast elimination of the alternatives. The sieving approaches like, Erotathene’s Sieve[6] or Sieve of Atkin or Rational Sieve [7] are eliminating alternatives, strating from the smallest prime number and the number searched increases in each step. This iterative approach from small to bigger prime numbers has a certain advantage while finding the prime factors of a composite number. But in the case of factoring for the semiprime numbers which are specially generated for the RSA crypto system, starting from small prime numbers has a disadvantage since we are aware that the searched prime number is much mor close to the square root of semi-prime number ( 2√𝑠𝑝). Our approach is as in Algorithm 1. By the definition, any composite number cn can be rewritten as in equation (1). m cn=p ∏ ci i=1 (1) Where the number of prime factors of cn is consierede as m+1. For the given cn, the equation (2) can be concluded. 𝑚 ( 𝑛 ∈ 𝑁 ∧ 𝑛|𝑓𝑖 ) ⇔ 𝑛𝑖 | {ℤ|𝑛 = (⋂ 𝑖=1 ℤ|𝑓𝑖 )} (2) Where N is the domain set of search for the prime numbers, 2 which are the numbers from 2 to √𝑐𝑛. If, any number 𝑛 ∈ 𝑁 is also a composite number with m factors, than testing the situation of 𝑐𝑛 | 𝑛 means, for all the m factors of n are already tested. Depending on the situation, since we are running a search algorithm, if the searched factor is found, than the search finishes. If, n is not the factor of cn, than the search, can be reduced by also eliminating the factors of cn from the search space. 5.1. Sample Run In order to present the new approach, we are also demonstrating a sample run over over the semi-prime number 47 x 53 = 2491. The search space is the numbers from 1 to 49, since the 2 √2491 = 49. The search algorithm starts by a sieve and tests the first alternative number 49 from the end of the sieve. Since 2491|49 = false, we can remove all the factors of composite number 49 from the search space. Table 1.Removing first factors of composite number 49 after the first iteration 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 In the second iteration, the second number from the end of the search space is considered, which is 48. Since the 2491|48 is false, all the factors of composite number 48, can be removed from the search space. The factors of 48 are 2 and 3 and the composite numbers can be generated from those factors are eliminated as shown on the Table 2. Table 2. After eliminating the factors and composite numbers from thos factors of 48 in second iteration 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 From the sieve in table 2, the next number in the search space is 47. Testing the 2491 | 47 is true, so we are finished with searching the numbers. If the results of 2491 | 47 would be false, than searching whould have conitnue and since all the numbers until 43 are eliminated, the next number in the search iteration would be 43. In table 2, the number of numbers in search space is reduced to 14 possibilities only, from the initial 49 numbers. From the sample run, we have found the factor in 3 steps. Any sieving approach would find the factor after trying all the prime numbers until 47. This brings up a performance obviously. During the elimination of factors of any composite number a factor tree can be implemented. 9. end else; 10. end for; Above algorithm demonstrates the execution of novel approach. The iterator value i starts from 2√𝑠𝑝 and iterates until the smallest prime number. In fact, we are aware that, one of the factors of semi-prime of RSA can never be 2 because of the vulnarability, but algorithm is designed in this manner for the worst case analysis. 6. EVALUATION The results of executions of various algorithms are demonstrated on the Table 3. Table 1. Execution Performance of the Factorization Algorithms Method Figure 2. Factorization tree for composite number 48 In figure 2, the factor tree holding the factors of 48 are demonstrated. Also the tree is ambigious since the same tree can be redrawn as in figure 2. Pollard Rho ECM Fermat Quadratic Sieve Erathostene Trial Division New Approach Average Execution 398 mins 3443 mins 30 mins 326 mins 1267052 mins 5510739 mins 5 mins In the table 3, the results are gathered from execution of thousands of random numbers with 8 digits. Also, in order to visualize the increase of time spent of algorithms, the execution times of algorithms for 6 of the methods are plotted in figure 4. Figure 3. Ambigious alternative factorization tree for composite number 48 Any drawing of the tree can be useful in the elimination of the search space. The deepest tree for any composite number can have maximum of numbers as given in equation (3). 𝑀𝑎𝑥 𝑑𝑒𝑝𝑡ℎ 𝑜𝑓 𝑓𝑎𝑐𝑡𝑜𝑟 𝑡𝑟𝑒𝑒 = log 2 𝑛 ⁄ (3) 2−1 Please remember the smallest prime number is 2 and the maximum internal node count can be 1 minus half of the total numbers of the nodes in a binary tree. Algorithm 1: A Novel Factorization for RSA Semi-Prime 1. 2. 3. 4. 5. 6. 7. 8. Let SP be a semi-prime with high factors, Let C be Closings of Stockmarket, 2 for i √𝑆𝑃 down to 2 begin if SP | i return i as factor else begin create a factor tree of i; eliminate all factors in sieve; decrease i; Figure 4. Performance evaluation of methods while the number of digits are increasing. The digits are quite low in Figure 4 and plotting is stopped for 5 digits, where the algorithms are still close to each other. After the number of digits are increased, some of the algorithms consumes higher time than the rest. REFERENCES [1] Rivest, R.; A. Shamir; L. Adleman (1978). "A Method for Obtaining Digital Signatures and Public-Key Cryptosystems". Communications of the ACM 21 (2): 120–126. doi:10.1145/359340.359342. [2] McKee, J. Speeding Fermat’s Factoring Method, Math. Comput. 68, 1729–1738,1999. [3] Pollard, J. M. A Monte Carlo method for factorization, BIT, Vol. 15 (1975) pp. 331–334 [4] Lenstra, H. W. Jr. "Factoring Integers with Elliptic Curves." Ann. Math. 126, 649-673, 1987.. Figure 5. Performance evaluation of methods while the number of digits are further more increasing. Depending on the setup time and difficulty of the numbers, some algorithms yield worse results than the rest. From the analytical perspective, it is known that the time complexity of the algorithms are as in Table 4. Table 4. Time Complexity of the Methods Method Pollard Rho Time Complexity O(B × log B × log2n) ECM O(L(p)M(log n)) Fermat O(d) Quadratic Sieve Erathostene O(log B loglog B) Trial Division New Approach O(√𝑛) O(dp) O(√𝑛 + 𝑝) Where B is the bound and n is the composite number. Where M(log n) is the complexity of multiplication mod n, and 𝐿(𝑝) = 𝛼 1−𝛼 𝑒 𝑐(log 𝑝) (𝑙𝑜𝑔𝑙𝑜𝑔𝑝) Where d is the distance between the two factors of the composite number. Where B is the bound. Where p is the number of primes below √𝑛 Where dp is the number of primes within the two factors of composite number. 7. CONCLUSION This study, brings up a new approach to the semi-prime number factorization very similar to the Fermat’s factorizatino algorithm. The biggest impact of semi-prime number factorization is the attack against crypto systems like RSA. During the study, we have evaluated the new approach and compare the success against most significant factorization algorithms. The success rate of the new approach seems quite convincing besides the encouraging analytical performance of the algorithm. We would also like to test the success of the new approach in bigger integer numbers like 50+ digits and also parallelization would be a challanging future work. [5] Gerver, J. Factoring Large Numbers with a Quadratic Sieve, Math. Comput. 41, 287-294, 1983. [6] Horsley, Rev. Samuel, F. R. S., "Κόσκινον Ερατοσθένους or, The Sieve of Eratosthenes. Being an account of his method of finding all the Prime Numbers," Philosophical Transactions (1683–1775), Vol. 62. (1772), pp. 327–347. [7] A.O.L. Atkin, D.J. Bernstein, Prime sieves using binary quadratic forms, Math. Comp. 73 (2004), 1023-1030