Download Initial Design Report - CENG 490 Design Project

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Location arithmetic wikipedia , lookup

List of prime numbers wikipedia , lookup

Vincent's theorem wikipedia , lookup

Transcript
Ceng 491
Computer Engineering Design I
Initial Design Report
Group
:
QuadCode
Project Name
:
LockSmith
Members
:
Alaettin Zubaroglu
Suleyman Kagan Samurkas
Aydin Goze Polat
Ubeyde Rizaoglu
Table of Contents
1. Introduction..............................................................................................................................3
1.1. Purpose..............................................................................................................................3
1.2. Scope.................................................................................................................................3
1.3. Definitions and Acronyms................................................................................................4
2. References................................................................................................................................5
3. Decomposition Description......................................................................................................6
3.1.Module Decomposition......................................................................................................6
3.1.1. Prime List Creator.....................................................................................................6
3.1.2. Primality Tester.........................................................................................................7
3.1.3. Perfect Square List Creator.......................................................................................7
3.1.4. Perfect Squareness Tester..........................................................................................8
3.1.5. Factorization Module ...............................................................................................8
3.1.6. Matrix Processing Module........................................................................................9
3.2. Concurrent Process Decomposition..................................................................................9
3.2.1. Processes in the Prime List Creator Module.............................................................9
3.2.2. Processes in the Primality Tester Module.................................................................9
3.2.3. Processes in the Perfect Square List Creator Module.............................................10
3.2.4. Processes in the Perfect Squareness Tester Module................................................10
3.2.5 Processes in the Factorization Module.....................................................................11
3.2.6. Processes in the Matrix Processing Module............................................................11
3.3. Data Decomposition........................................................................................................11
4. Dependency............................................................................................................................12
4.1. Intermodule Dependency ...............................................................................................12
4.2. Interprocess Dependency................................................................................................12
4.3. Data Dependency............................................................................................................13
5. Interface Description..............................................................................................................13
5.1. Module Interface.............................................................................................................13
5.2. Process Interface.............................................................................................................13
6. Libraries and Protocols Will Be Used....................................................................................13
6.1. MPI.................................................................................................................................13
6.2. OpenMP..........................................................................................................................15
6.3. GMP................................................................................................................................16
7. Timetable................................................................................................................................17
1. Introduction
1.1. Purpose
The aim of this project is to test the security level of the keys used in RSA encryption
algorithm, by means of trying to crack them. Parallel programming techniques will be used for
this purpose. The product will run on a cluster and will be developed on 'nar' which is the high
performance computer in our department. This project will eliminate weak keys and this will
cause security protocols to use stronger keys. In addition to these features, the user will be able
to use this product to test primality and being perfect square of a big number in an efficient
way.
1.2. Scope
The project will include different modules working for different aims. In addition to main
purpose of the product, the user will also be able to use any module of the product that serves
his/her needs.
The modules will serve the jobs such as creating the prime list, testing primality of a big
number using the prime list, creating perfect square list, testing perfect squareness of a big
number, factorizing a number using the prime list and creating a matrix that is needed by
another module, and processing this matrix and producing the final answer.
The main purpose of this project is to find two factors of a very big number n. However, the
user will be able to use some of the modules individually for some special purposes. For
example, the user can use the primality testing module to test primality of a big number, also
he/she can use the perfect squareness testing module to test perfect squareness of a big number.
These modules will use the prime list and the perfect square list. Thus it is needed to create
these lists before using other modules and the whole project. Moreover, the user will be able to
access (in a read only permission) and examine these lists by hand.
This product finds two factors of the given number if it has. However, it is not guaranteed that
these factors are primes. They may be composite numbers. Note that, keys that used in RSA
encryption algorithm are product of exactly two prime numbers. Thus, when an RSA key is
given, the output of the product will be two prime numbers.
1.3. Definitions and Acronyms
Definitions :
composite number
:
A number that is not prime.
congruent
:
Identical.
factor
:
A factor of a whole number is a smaller whole number which can
be multiplied with another whole number to produce the first
whole number.
factorization
:
Resolution of an integer or polynomial into factors so that when
multiplied together they give the integer or polynomial.
perfect square
:
An integer that is the square of an integer.
primality
:
The property of being a prime number.
prime number
:
Number that can only be divided by itself and 1 without
remainders.
de facto
:
De facto is a Latin expression that means "concerning the fact" or
in practice but not ordained by law. It is commonly used in
contrast to “de jure” (which means "by law") when referring to
matters of law, governance, or technique (such as standards) that
are found in the common experience as created or developed
without or contrary to a regulation.
Acronyms :
RSA
:
The algorithm was publicly described in 1977 by Ron Rivest, Adi
Shamir, and Leonard Adleman at MIT; the letters RSA are the initials of
their surnames.
MPI
:
Message Passing Interface.
OpenMP
:
Open Multi-Processing.
GMP
:
GNU Multiple Precision.
NUMA
:
Non-Uniform Memory Access or Non-Uniform Memory Architecture.
OSI
:
Open System Interconnection.
TCP
:
Transmission Control Protocol.
API
:
Application programming interface.
LIS
:
Language Independent Specifications.
PVM
:
Parallel Virtual Machine.
CPU
:
Central processing unit.
2. References
http://en.wikipedia.org/wiki/Message_Passing_Interface
http://en.wikipedia.org/wiki/OpenMP
http://en.wikipedia.org/wiki/GNU_Multi-Precision_Library
http://gmplib.org/
http://en.wikipedia.org/wiki/De_facto
3. Decomposition Description
3.1.Module Decomposition
3.1.1. Prime List Creator
The product needs a prime list to work. Actually we can find prime lists from internet or other
resources. However they may not be reliable. Thus we will develop a module for this purpose.
This module will create and save a prime list and other modules, if they need, will use this
prime list. The prime list will be created only once and saved to the disk for future use.
How to create prime list :
The module initially will create a small prime list before starting to work in parallel. When
prime list includes enough number of primes (say 100 primes) the main program will create
threads that work in parallel and update the prime list.
We can try two alternative ways to distribute the numbers to threads:
In the first way, if the greatest prime number in the list was k when the threads had created,
and number of threads is t; ith thread will check the numbers can be represented by the formula
k + 2*i + 2*t*a where a = 0, 1, 2, 3, ...
To check primality of a number, the thread will try to divide the number by all primes in the
prime list up to square root of the number. When any factor is found, the next number will be
checked. If no factor is found, that means the number being checked is prime, and it will be
added to the prime list. These threads will be killed when size of the prime list become a
predetermined number (Optimum value for this number is around 500.000).
In the second way, each thread will be given a block of adjacent odd numbers to check. When
this block is finished, the thread will be given the next unchecked block, and the prime list will
be updated in this way.
3.1.2. Primality Tester
This module will test primality of a given number using the prime list. If the user wants only to
check primality of a number, he/she will be able to use this module independent from the main
product.
If the given number is prime, the module will indicate that. Otherwise, the module will give a
prime factor of the number.
Note that, because this module will use the prime list, the numbers that can be tested by this
module will be limited by the square of the greatest prime in the list.
3.1.3. Perfect Square List Creator
The product can either use a perfect square list to work or it can be implemented in a different
way that does not use any perfect square list. In the method that does not use the perfect
square list, the production of square numbers is done by a predefined special type of formula.
However we prefer the first method so we need a perfect square list and we will implement this
module to create this list.
How to create perfect square list:
Perfect squares are : 0, 1, 4, 9, 16, 25, 36, 49...
The difference of perfect squares are : 1, 3, 5, 7, 9, 11, 13...
(n + 1)2 – n2 = n2 + 2*n + 1 – n2 = 2*n + 1
So, an efficient formula that calculates perfect squares is :
n0 = 0;
d0 = 1;
nk+1 = nk + dk;
dk+1 = dk + 2;
That formula yields to the algorithm :
square = 0
difference = 1
for 0 -> LIMIT
add_to_list(square)
square += difference
difference += 2
To parallelize this module, we will run some number of processes and each of them will be
given a different block of adjacent numbers. The process will at first calculate initial square
and difference values according to the first number of that block. Then, the process will use the
algorithm above to work more efficient.
3.1.4. Perfect Squareness Tester
This module will test squareness of a given number using the perfect square list. If the user
wants only to check squareness of a number, he/she will be able to use this module
independent from the main product.
If the given number is perfect square, the module will return the square root of the given
number. Otherwise the program will indicate that the given number is not a perfect square.
Note that, because this module will use the perfect square list, the numbers that can be tested
by this module will be limited by the greatest perfect square in the list.
3.1.5. Factorization Module
This module will take numbers from the perfect square list (say ai) and then calculate bi such
that ai = bi (mod n) and 0 <= bi < n. Then it will try to factorize bi using the primes in the prime
list.
Factorization of each bi will be done in a single core but each core will work for a different
block of ai. Thus, this job also will be done in a parallel manner.
If all factors of bi are in our prime list, a new row representing the factors of bi will be added
to the matrix. Otherwise, bi will be discarded and bi+1 will be tried.
When perfect square list is finished, if sufficient number of rows is not reached, the perfect
square list will be expanded until enough number of rows are found. Number of rows must be
equal to number of primes in the prime list plus one in order to guarantee the linear
dependency among the rows.
3.1.6. Matrix Processing Module
By this module starts to run, there will be a binary matrix consists of t columns and t+1 rows
where t is number of the primes in the prime list.
The linear dependency among the rows of this matrix is guaranteed because the number of
rows is greater than the number of columns. In order to find that dependency this module will
use the Gaussian Elimination Method. There are various methods for parallelizing the
Gaussian Elimination Method.
3.2. Concurrent Process Decomposition
3.2.1. Processes in the Prime List Creator Module
There will be a number of threads or processes in this module that work together to create the
prime list. Each thread or process will work on only the set of numbers that reserved for it and
update the shared prime list. When prime list reaches the desired size, these threads and
processes will terminate.
This module will run only once and the prime list will be saved on the disk. When other
modules use this list, they will access it from the disk in read only permission.
3.2.2. Processes in the Primality Tester Module
To check primality of a given number, we should test divisibility of this number by the prime
numbers that are less than square root of the given number.
This module, firstly will calculate the integer square root of the number and the number of
primes that are less than the square root. Then it will distribute nearly equal number of primes
to each thread or process and wait an answer from them. If any thread or process find a factor,
the main program will return this factor, terminate other processes and exit. Otherwise, if none
of the threads or processes find any factor, the program will indicate that the given number is a
prime.
3.2.3. Processes in the Perfect Square List Creator Module
There will be a number of threads or processes in this module that work together to create the
perfect square list. Each thread or process will work on only the set of numbers that reserved
for it and update the shared perfect square list. When perfect square list reaches the desired
size, these threads and processes will terminate.
This module will run only once and the perfect square list will be saved on the disk. When
other modules use this list, they will access it from the disk. However, if this list is not
sufficient, the Factorization Module will call this module to expand the perfect square list.
3.2.4. Processes in the Perfect Squareness Tester Module
This module will test squareness of a given number using the perfect square list. In order to do
that, the module will search to given number in the perfect square list. If the number is found
in the list, it is the square of its line number. If the number is not in the list, the number is not a
perfect square.
There will be a number of threads or processes in this module that work together. The module
will firstly distribute the perfect square list on the threads and processes and each thread or
process will work on only the set of numbers that reserved for it. If the given number is not
between boundaries of the set, the process will directly return with an unsuccessful result. If
any thread or process returns a successful result, the main program will terminate other threads
and processes and indicate that the given number is a perfect square. Otherwise, the given
number is not in the list, and this means the number is not a perfect square. Note that, because
this module will use the perfect square list, it is limited by the greatest value of the list.
3.2.5 Processes in the Factorization Module
This module will take numbers from the perfect square list (say ai) and then calculate bi such
that ai = bi (mod n) and 0 <= bi < n. Then it will try to factorize bi using the primes in the prime
list.
Factorization of each bi will be done in a single core but each core will work for a different
block of ai. The module will firstly distribute the perfect square list to threads and processes
and each thread or process will calculate bi and update the matrix if all factors of bi are in the
prime list.
3.2.6. Processes in the Matrix Processing Module
The processing of the matrix will be done using the Gaussian Elimination Method. This work
must be done in a parallel manner. There are various methods for parallelizing this method.
The details of parallelizing the Gaussian Elimination Method will be given in the Detailed
Design Report of the project.
3.3. Data Decomposition
Prime List : This list will be kept on the disk after created, and when needed by a module, it
will be loaded to the memory entirely.
The number of primes in the prime list will be around 500 000 and the RSA key n that we will
deal with, will be around 512 bits. Moreover, our greatest prime in the prime list, will be much
smaller than the key. Assume we will have 500 000 primes and each of them is 512 bits = 64
bytes. In such a situation the size of the prime list will be 32 MB. This shows us it does not
lead to a problem to hold it in the memory.
Perfect Square List : This list will be kept on the disk after created, and when needed by a
module, its appropriate part will be loaded to the memory.
The Matrix : This will be a 500000 * 500001 binary matrix. It will be around 32 GB large.
So, although it doesn't fit in the memory of one node, it can be distributed among the memories
of several nodes, or it can be loaded into the shared memory area.
4. Dependency
4.1. Intermodule Dependency
Module Name
Dependencies
Prime List Creator
none
Primality Tester
Prime List Creator
Explanation
Primality Tester needs Prime
List that is created by Prime
List Creator.
Perfect Square List Creator
none
Perfect Squareness Tester
Perfect Square List Creator
Perfect
Squareness
Tester
needs Perfect Square List that
is created by Perfect Square
List Creator.
Factorization Module
Perfect Square List Creator, Factorization Module needs
Prime List Creator
Perfect Square List
that is
created by Perfect Square List
Creator and Prime List that is
created by Prime List Creator.
Matrix Processing Module
Factorization Module
Matrix
Processing
needs the
created
by
Module
matrix that is
Factorization
Module.
4.2. Interprocess Dependency
All modules in this project have internal parallelization. The common point of those
parallelization is being “single-process multiple-data” type. So, there is no dependency
between processes, but if one of the processes reaches a terminating condition (for example;
finding an answer) , it causes all other processes to terminate.
4.3. Data Dependency
There is no dependency between the Prime List and the Perfect Square list.
The Matrix depends on both Prime List and Perfect Square List. The rows of the matrix are
created by factoring (into the Prime List) the congruent (modulo n) of the numbers from the
Perfect Square List.
5. Interface Description
5.1. Module Interface
Module 1
Module 2
Interface
Primality Tester
Prime List Creator
Prime List File
Perfect Squareness Tester
Perfect Square List Creator
Perfect Square List File
Factorization Module
Perfect Square List Creator
Perfect Square List File
Factorization Module
Prime List Creator
Prime List File
Matrix Processing Module
Factorization Module
Matrix
5.2. Process Interface
Since there is no dependency among the processes, there is no need to interface.
6. Libraries and Protocols Will Be Used
6.1. MPI
MPI is a language-independent communications protocol used to program parallel computers.
Both point-to-point and collective communication is supported. MPI is a message-passing
application programmer interface, together with protocol and semantic specifications for how
its features must behave in any implementation. MPI's goals are high performance, scalability,
and portability. MPI remains the dominant model used in high-performance computing today.
MPI is not sanctioned by any major standards body; nevertheless, it has become a “de facto”
standard for communication among processes that model a parallel program running on a
distributed memory system. Actual distributed memory supercomputers such as computer
clusters often run these programs. The principal MPI-1 model has no shared memory concept,
and MPI-2 has only a limited distributed shared memory concept. Nonetheless, MPI programs
are regularly run on shared memory computers. Designing programs around the MPI model (as
opposed to explicit shared memory models) has advantages on NUMA architectures since MPI
encourages memory locality.
Although MPI belongs in layers 5 and higher of the OSI Reference Model, implementations
may cover most layers of the reference model, with socket and TCP being used in the transport
layer.
Most MPI implementations consist of a specific set of routines (i.e., an API) callable from
Fortran, C, or C++ and from any language capable of interfacing with such routine libraries.
The advantages of MPI over older message passing libraries are portability (because MPI has
been implemented for almost every distributed memory architecture) and speed (because each
implementation is in principle optimized for the hardware on which it runs).
MPI has Language Independent Specifications (LIS) for the function calls and language
bindings. The first MPI standard specified ANSI C and Fortran-77 language bindings together
with the LIS.
There are two versions of the standard that are currently popular: version 1.2 (shortly called
MPI-1), which emphasizes message passing and has a static runtime environment, and MPI-2.1
(MPI-2), which includes new features such as parallel I/O, dynamic process management and
remote memory operations. MPI-2's LIS specifies over 500 functions and provides language
bindings for ANSI C, ANSI Fortran (Fortran90), and ANSI C++. Interoperability of objects
defined in MPI was also added to allow for easier mixed-language message passing
programming. A side effect of MPI-2 standardization (completed in 1996) was clarification of
the MPI-1 standard, creating the MPI-1.2 level.
It is important to note that MPI-2 is mostly a superset of MPI-1, although some functions have
been deprecated. Thus MPI-1.2 programs still work under MPI implementations compliant
with the MPI-2 standard.
MPI is often compared with PVM, which is a popular distributed environment and message
passing system developed in 1989, and which was one of the systems that motivated the need
for standard parallel message passing systems. Threaded shared memory programming models
(such as Pthreads and OpenMP) and message passing programming (MPI/PVM) can be
considered as complementary programming approaches, and can occasionally be seen used
together in applications where this suits architecture, e.g. in servers with multiple large sharedmemory nodes.
6.2. OpenMP
OpenMP is an implementation of multi-threading, a method of parallelization whereby the
master thread (a series of instructions executed consecutively) forks a specified number of
slave threads and a task is divided among them. The threads then run concurrently, with the
runtime environment allocating threads to different processors.
The section of code that is meant to run in parallel is marked accordingly, with a preprocessor
directive that will cause the threads to form before the section is executed. Each thread has an
ID attached to it. The thread ID is an integer, and the master thread has an ID of 0. After the
execution of the parallelized code, the threads join back into the master thread, which
continues onward to the end of the program.
By default, each thread executes the parallelized section of code independently. "Work-sharing
constructs" can be used to divide a task among the threads so that each thread executes its
allocated part of the code. Both Task parallelism and Data parallelism can be achieved using
OpenMP in this way.
The runtime environment allocates threads to processors depending on usage, machine load
and other factors. The number of threads can be assigned by the runtime environment based on
environment variables or in code using functions. The OpenMP functions are included in a
header file labeled "omp.h" in C/C++.
6.3. GMP
GMP is a free library for arbitrary precision arithmetic, operating on signed integers, rational
numbers, and floating point numbers. There is no practical limit to the precision except the
ones implied by the available memory in the machine GMP runs on. GMP has a rich set of
functions, and the functions have a regular interface.
The main target applications for GMP are cryptography applications and research, Internet
security applications, algebra systems, computational algebra research, etc.
GMP is carefully designed to be as fast as possible, both for small operands and for huge
operands. The speed is achieved by using full words as the basic arithmetic type, by using fast
algorithms, with highly optimized assembly code for the most common inner loops for a lot of
CPUs, and by a general emphasis on speed.
7. Timetable
Months
To Do
- Decide on details of the design of the project.
December
- Implement simple parallel programs to get experience on MPI OpenMP and
parallel programming concept.
- Prepare initial and detailed design reports.
January
- Preparation of the prototype demo.
- Semester Holiday.
February
- Start to implement the prime list creator module.
- Start to implement the perfect square list creator module.
- Finish implementation of the prime list creator module.
March
- Finish implementation of the perfect square list creator module.
- Create the prime list and the perfect square list.
April
- Start to implement the Factorization Module.
- Start to implement the Matrix Processing Module.
May
- Finish all implementations.
- Work on debugging, profiling and optimization.
- Finalize the project.
June
- Show that the project is working correctly.
- Prepare Final Presentation.