Download Reverse Factorization and Comparison of Factorization Al

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Infinitesimal wikipedia , lookup

Large numbers wikipedia , lookup

List of prime numbers wikipedia , lookup

Location arithmetic wikipedia , lookup

Positional notation wikipedia , lookup

Approximations of π wikipedia , lookup

Elementary arithmetic wikipedia , lookup

Algorithm wikipedia , lookup

Elementary mathematics wikipedia , lookup

Addition wikipedia , lookup

Proofs of Fermat's little theorem wikipedia , lookup

Arithmetic wikipedia , lookup

Factorization of polynomials over finite fields wikipedia , lookup

Transcript
Reverse Factorization and Comparison of
Factorization Algorithms in attack to RSA
Sadi Evren SEKER
Dept. of Business Administration.
Istanbul Medeniyet University
[email protected]
ABSTRACT
Factorization algorithms have a major role in the computer
security and cryptography. Most of the widely used cryptographic algorithms, like RSA, are built on the mathematical
difficulty of factorization for big prime numbers. This research, proposes a new approach to the factorization by using
two new enhancements. The new approach is also compared
with six different factorization algorithms and evaluated the
performance on a big data environment. The algorithms covered are elliptic curve method, quadratic sieve, Fermat’s
method, trial division and Pollard rho methods. Success rates
are compared over a million of integer numbers with different
difficulties. We have implemented our own algorithm for random number generation, which is also explained in the paper.
We also empirically show that the new approach has an advantage on the factorization attack to RSA.
Cihan MERT
Electrical Engineering Dept.
The University of Texas at Dallas
[email protected]
details of the factorization algorithms we have implemented
in this study. The experiments section will go into the details
of the big number of integers their properties after generation
and evaluation of the algorithms.
2. PROBLEM STATEMENT
A stepwise approach to the study can be viewed as in the Figure 1.
Keywords
Factorization, Cryptography, Benchmarking
Acknowledgement
Work of Sadi Evren SEKER is supported by Istanbul University,
research projects department under project number YADOP-27254
1. INTRODUCTION
This study can be viewed as three major steps. In the first
layer, we have generated big integers with a new approach on
the generation. After the generation, the factorization algorithms including the new approach are executed. Finally, on
the last step the performance of the algorithms are evaluated.
In this paper, the problem will be defined and an overview of
the problem will be demonstrated in the problem statement
section. The related work section will cover a brief literature
review about the contemporary studies on the factorization
algorithms. The background chapter will briefly describe the
Figure 1. Overview of Study
In order to simulate the RSA prime number factorization
problem, we have only concentrated on the semi-prime numbers. The random generator is designed to generate the semiprime numbers. In order to make the time performance more
explicit we have generated huge number of semi-prime num-
bers and stored them in a database. After storing, the factorization algorithms are executed on those numbers. Finally
each algorithm is evaluated in the time performance.
3. BACKGROUND
From the early times, the factorization of composite numbers
has been an interesting area of studying and there are some
algorithms carried on like Sieve of Eratosthenes ( 276 – 194
BC).
Also the by the spreading usage of modern cryptographic systems which some are built on the difficulty of factoring, like
RSA[1], the factorization problem has been a studying area.
Initially factoring started with dividing a number by larger
and larger primes until you had the factorization. This trial
division was not improved until Fermat’s method in which
the factorization of the difference of two squares is used.
While Fermat's method is much faster than trial division,
when it comes to the real world of factoring, for example for
factoring several hundred digits long RSA modulus, the
purely iterative Fermat’s method is too slow. This led the development of several other methods, such as a pair of probabilistic methods by Pollard in the mid 70's, the p − 1 method
and the ρ method, the Elliptic Curve Method discovered by
H. Lenstra in 1987 . However, the fastest algorithms such as
the Number Field Sieve (and its variants), the Quadratic Sieve
(and it variants), and Continued Fraction Method utilize the
same trick as Fermat. The remainder of this paper will briefly
discuss some of the above methods and focus on reverse factorization method, a new approach.
3.1. Factorization by Trial Division
Trial method is a brute-force method of finding a divisor of
an integer N by simply trying if N is divisible by
2,3,5,7,11,13,17,…, i.e., all primes which are less than or
equal to √𝑁 in succession, until a divisor is reached.
To partially or completely factor N, Trial division is an effective and simple method. It is reasonable to use trial division
method as a factoring method when N is not too large.
3.2. Fermat Factorization
Fermat's factorization method [2] looks for the representation
of an odd integer N as the difference of two squares N =
a2 − b2 . Then
N = (a − b)(a + b)
and N is factored.
To factor any number N, first calculate √N. Then compute
a2 − N starting with a, the first integer greater than √N and
continue until reaching a square b2 . Since a2 − N = b2 ,
N = a2 − b2 . So N is factorized into N = (a − b)(a + b) .If
the only factors found are N and 1, then N is a prime number.
If N is not prime, use the same algorithm for each factor.
Fermat's method works well when the number is factorized
into two terms of approximately equal size. It works poorly
when the factors are of very different sizes.
3.3. Quadratic Sieve
To factorize a number n, quadratic sieve method [3] attempts
to find two numbers x and y such that 𝑥 ≢ ±𝑦 (𝑚𝑜𝑑 𝑛) and
𝑥2 ≡ 𝑦2 (𝑚𝑜𝑑 𝑛). If two such numbers are found, this
implies that (x − y)(x + y) ≡ 0 (mod n). Then, x − y must
have non-trivial factors in common with n. To achieve this,
a common strategy for finding such x and y is the following.
Choose a smoothness bound B. The number π(B),which
denotes the number of prime numbers less than B, will control
both the number of vectors needed and the length of the
vectors. Then use sieving to locate π(B) + 1 numbers 𝑥𝑖 such
that 𝑦𝑖 ≡ (𝑥𝑖2 𝑚𝑜𝑑 𝑛) is B-smooth. Factor the 𝑦𝑖 and
generate exponent vectors mod 2 for each one. Find a subset
of these vectors which add to the zero vector. Multiply the
corresponding 𝑥𝑖 together naming the result mod n: x and
the 𝑦𝑖 together which yields a B-smooth square 𝑦 2 . Next,
obtained equality 𝑥2 ≡ 𝑦2 (𝑚𝑜𝑑 𝑛) gives two square roots of
(𝑥 2 𝑚𝑜𝑑 𝑛), one by taking the square root in the integers
of 𝑦 2 namely 𝑦, and the other the a computed in previous
step. Having desired identity(x − y)(x + y) ≡ 0(mod n),
compute the 𝐺𝐶𝐷(𝑥 − 𝑦, 𝑛). This gives a factor. If the factor
is trivial, try again with a different a or linear dependency.
3.4. Pollard Rho
Pollard’s rho method [4] is based on a combination of two
ideas on Floyd's cycle-finding algorithm and birthday
paradox that are also useful for various other factoring
methods.
Let N be a number that is neither a perfect power nor a prime
and p the smallest prime factor of N.
Generate sequence of numbers 𝑥0 , 𝑥1 , 𝑥2 , … from 𝑍𝑁
uniformly, independently at random then after at most p + 1
such pickings for the first time, there are two numbers 𝑥𝑖 and
𝑥𝑠 with i < s such that 𝑥𝑖 ≡ 𝑥𝑠 (𝑚𝑜𝑑 𝑝). Since N is
not a perfect power, there is another prime factor q > p of N.
Since the numbers 𝑥𝑖 and 𝑥𝑠 are randomly chosen from 𝑍𝑁 ,
by the Chinese remaindering theorem, 𝑥𝑖 ≢ 𝑥𝑠 (𝑚𝑜𝑑 𝑞)
with probability 1 − 1/𝑞 even under the condition that 𝑥𝑖 ≡
𝑥𝑠 (𝑚𝑜𝑑 𝑝). Therefore, 𝑔𝑐𝑑(𝑥𝑖 − 𝑥𝑠 , 𝑁) is a nontrivial
factor of N with probability at least 1 − 1/𝑞.
Since the 𝑥𝑖 𝑚𝑜𝑑 𝑝 behave more or less as random integers
in 0,1, … , 𝑝 − 1 , by computing 𝑔𝑐𝑑(𝑥𝑖 − 𝑥𝑗 , 𝑁), for 𝑖 ≠ 𝑗 ,
the factorization of N after about 𝑐 √𝑝 elements of the
sequence can be computed, for some small constant c.
2
This suggests that approximately (𝑐√𝑝) /2 pairs 𝑥𝑖 , 𝑥𝑗 have
to be considered. However, this can easily be avoided by only
computing
𝑔𝑐𝑑(𝑥𝑖 − 𝑥2𝑖 , 𝑁) for 𝑖 = 0,1, … , i.e., by
generating two copies of the sequence, one at the regular
speed and one at the double speed. This can be expected to
result in a factorization of N after approximately 2√𝑝 gcd
computations. If this GCD ever comes to N, then the
algorithm terminates with failure, since this means 𝑥𝑖 =
𝑥2𝑖 and therefore, by Floyd's cycle-finding algorithm, the
sequence has cycled and continuing any further would only
be repeating previous work.
4. Semi-prime Factorization in RSA
This study focus on the fast and efficient factorization for the
semi-prime numbers. The semi-prime numbers are
considered as the multiplication of two prime numbers, say p
and q. In some sources the semi-prime numbers are also
named as pq numbers for this reason.
The advantage of factorizing the semi-prime numbers in RSA
crypto system is the two prime factors of semi-prime numbers
should be in equal digists or almost in equal digits. The reason
is, if the number of digits of one prime of the semi-prime
number is smaller than the other, the system woul have a
weakness.
The weakness can be explained like this. The RSA system is
built on the time complexity of factorizing the semi-prime
number into two factors. The time complexity increases by
the number of digits. For example the time required to
factorize a 20 digit number is muh more higher than the time
required for factorizing 19 digit number. But if one of the
factors of the high digit number is so small. Let’s give an
example of extreme case with one digit prime like 2,3,5 or 7.
Than factorizing the number would be much more easier.
And finding any factor of the number would make it even
easier to find the second factor. So, in most of the cases, RSA
uses the two prime numbers with equal digits to generate a
semi-prime number.
The novel approach proposed in this study, considers this as
a vulnerability and and proofs that, using the same digit
primes to generate a semi-prime is also makes easier to get
factorization with the novel method explained in this paper.
5. A Novel Approach to Semi-prime Factorization
In the new approach, we see the problem as a search problem,
where the factors p and q of a semi-prime number sp are
smaller than the 𝑝, 𝑞 < 2√𝑠𝑝 we propose to implement a
sieving approach, which increaes the speed of searching by
eliminating some of the possibilities in each check. On the
other hande, we propose to keep a factor tree for fast
elimination of the alternatives.
The sieving approaches like, Erotathene’s Sieve[6] or Sieve
of Atkin or Rational Sieve [7] are eliminating alternatives,
strating from the smallest prime number and the number
searched increases in each step.
This iterative approach from small to bigger prime numbers
has a certain advantage while finding the prime factors of a
composite number. But in the case of factoring for the semiprime numbers which are specially generated for the RSA
crypto system, starting from small prime numbers has a
disadvantage since we are aware that the searched prime
number is much mor close to the square root of semi-prime
number ( 2√𝑠𝑝).
Our approach is as in Algorithm 1. By the definition, any
composite number cn can be rewritten as in equation (1).
m
cn=p ∏ ci
i=1
(1)
Where the number of prime factors of cn is consierede as
m+1.
For the given cn, the equation (2) can be concluded.
𝑚
( 𝑛 ∈ 𝑁 ∧ 𝑛|𝑓𝑖 ) ⇔ 𝑛𝑖 | {ℤ|𝑛 = (⋂
𝑖=1
ℤ|𝑓𝑖 )}
(2)
Where N is the domain set of search for the prime numbers,
2
which are the numbers from 2 to √𝑐𝑛.
If, any number 𝑛 ∈ 𝑁 is also a composite number with m
factors, than testing the situation of 𝑐𝑛 | 𝑛 means, for all the
m factors of n are already tested. Depending on the situation,
since we are running a search algorithm, if the searched factor
is found, than the search finishes. If, n is not the factor of cn,
than the search, can be reduced by also eliminating the factors
of cn from the search space.
5.1. Sample Run
In order to present the new approach, we are also
demonstrating a sample run over over the semi-prime number
47 x 53 = 2491.
The search space is the numbers from 1 to 49, since the
2
√2491 = 49. The search algorithm starts by a sieve and tests
the first alternative number 49 from the end of the sieve. Since
2491|49 = false, we can remove all the factors of composite
number 49 from the search space.
Table 1.Removing first factors of composite number 49 after
the first iteration
1
2
3
4
5
6
7
8
9
10
11
12 13
14
15 16
17
18
19 20
21
22 23
24
25
26 27
28
29 30
31
32
33 34
35
36 37
38
39
40 41
42
43 44
45
46
47 48
49
In the second iteration, the second number from the end of
the search space is considered, which is 48. Since the 2491|48
is false, all the factors of composite number 48, can be
removed from the search space. The factors of 48 are 2 and
3 and the composite numbers can be generated from those
factors are eliminated as shown on the Table 2.
Table 2. After eliminating the factors and composite
numbers from thos factors of 48 in second iteration
1
2
3
4
5
6
7
8
9
10
11
12 13
14
15 16
17
18
19 20
21
22 23
24
25
26 27
28
29 30
31
32
33 34
35
36 37
38
39
40 41
42
43 44
45
46
47 48
49
From the sieve in table 2, the next number in the search space
is 47. Testing the 2491 | 47 is true, so we are finished with
searching the numbers.
If the results of 2491 | 47 would be false, than searching
whould have conitnue and since all the numbers until 43 are
eliminated, the next number in the search iteration would be
43. In table 2, the number of numbers in search space is
reduced to 14 possibilities only, from the initial 49 numbers.
From the sample run, we have found the factor in 3 steps. Any
sieving approach would find the factor after trying all the
prime numbers until 47. This brings up a performance
obviously.
During the elimination of factors of any composite number a
factor tree can be implemented.
9.
end else;
10.
end for;
Above algorithm demonstrates the execution of novel
approach. The iterator value i starts from 2√𝑠𝑝 and iterates
until the smallest prime number. In fact, we are aware that,
one of the factors of semi-prime of RSA can never be 2
because of the vulnarability, but algorithm is designed in this
manner for the worst case analysis.
6. EVALUATION
The results of executions of various algorithms are
demonstrated on the Table 3.
Table 1. Execution Performance of the Factorization Algorithms
Method
Figure 2. Factorization tree for composite number 48
In figure 2, the factor tree holding the factors of 48 are
demonstrated. Also the tree is ambigious since the same tree
can be redrawn as in figure 2.
Pollard Rho
ECM
Fermat
Quadratic Sieve
Erathostene
Trial Division
New Approach
Average
Execution
398 mins
3443 mins
30 mins
326 mins
1267052 mins
5510739 mins
5 mins
In the table 3, the results are gathered from execution of
thousands of random numbers with 8 digits.
Also, in order to visualize the increase of time spent of
algorithms, the execution times of algorithms for 6 of the
methods are plotted in figure 4.
Figure 3. Ambigious alternative factorization tree for
composite number 48
Any drawing of the tree can be useful in the elimination of
the search space. The deepest tree for any composite number
can have maximum of numbers as given in equation (3).
𝑀𝑎𝑥 𝑑𝑒𝑝𝑡ℎ 𝑜𝑓 𝑓𝑎𝑐𝑡𝑜𝑟 𝑡𝑟𝑒𝑒 = log 2 𝑛 ⁄
(3)
2−1
Please remember the smallest prime number is 2 and the
maximum internal node count can be 1 minus half of the total
numbers of the nodes in a binary tree.
Algorithm 1: A Novel Factorization for RSA Semi-Prime
1.
2.
3.
4.
5.
6.
7.
8.
Let SP be a semi-prime with high factors,
Let C be Closings of Stockmarket,
2
for i  √𝑆𝑃 down to 2 begin
if SP | i return i as factor
else begin
create a factor tree of i;
eliminate all factors in sieve;
decrease i;
Figure 4. Performance evaluation of methods while the
number of digits are increasing.
The digits are quite low in Figure 4 and plotting is stopped for 5
digits, where the algorithms are still close to each other. After the
number of digits are increased, some of the algorithms consumes
higher time than the rest.
REFERENCES
[1] Rivest, R.; A. Shamir; L. Adleman (1978). "A Method for
Obtaining Digital Signatures and Public-Key Cryptosystems".
Communications of the ACM 21 (2): 120–126.
doi:10.1145/359340.359342.
[2] McKee, J. Speeding Fermat’s Factoring Method, Math.
Comput. 68, 1729–1738,1999.
[3] Pollard, J. M. A Monte Carlo method for factorization, BIT,
Vol. 15 (1975) pp. 331–334
[4] Lenstra, H. W. Jr. "Factoring Integers with Elliptic
Curves." Ann. Math. 126, 649-673, 1987..
Figure 5. Performance evaluation of methods while the
number of digits are further more increasing.
Depending on the setup time and difficulty of the numbers, some
algorithms yield worse results than the rest.
From the analytical perspective, it is known that the time complexity
of the algorithms are as in Table 4.
Table 4. Time Complexity of the Methods
Method
Pollard Rho
Time Complexity
O(B × log B × log2n)
ECM
O(L(p)M(log n))
Fermat
O(d)
Quadratic
Sieve
Erathostene
O(log B loglog B)
Trial Division
New
Approach
O(√𝑛)
O(dp)
O(√𝑛 + 𝑝)
Where B is the bound
and
n
is
the
composite number.
Where M(log n) is
the complexity of
multiplication mod n,
and
𝐿(𝑝) =
𝛼
1−𝛼
𝑒 𝑐(log 𝑝) (𝑙𝑜𝑔𝑙𝑜𝑔𝑝)
Where d is the
distance between the
two factors of the
composite number.
Where B is the
bound.
Where p is the
number of primes
below √𝑛
Where dp is the
number of primes
within the two factors
of
composite
number.
7. CONCLUSION
This study, brings up a new approach to the semi-prime number
factorization very similar to the Fermat’s factorizatino algorithm.
The biggest impact of semi-prime number factorization is the attack
against crypto systems like RSA. During the study, we have
evaluated the new approach and compare the success against most
significant factorization algorithms. The success rate of the new
approach seems quite convincing besides the encouraging analytical
performance of the algorithm. We would also like to test the success
of the new approach in bigger integer numbers like 50+ digits and
also parallelization would be a challanging future work.
[5] Gerver, J. Factoring Large Numbers with a Quadratic
Sieve, Math. Comput. 41, 287-294, 1983.
[6] Horsley, Rev. Samuel, F. R. S., "Κόσκινον Ερατοσθένους or,
The Sieve of Eratosthenes. Being an account of his method of
finding all the Prime Numbers," Philosophical Transactions
(1683–1775), Vol. 62. (1772), pp. 327–347.
[7] A.O.L. Atkin, D.J. Bernstein, Prime sieves using binary
quadratic forms, Math. Comp. 73 (2004), 1023-1030