Download Initial Design Report - CENG 490 Design Project

Ceng 491 Computer Engineering Design I Initial Design Report Group : QuadCode Project Name : LockSmith Members : Alaettin Zubaroglu Suleyman Kagan Samurkas Aydin Goze Polat Ubeyde Rizaoglu Table of Contents 1. Introduction..............................................................................................................................3 1.1. Purpose..............................................................................................................................3 1.2. Scope.................................................................................................................................3 1.3. Definitions and Acronyms................................................................................................4 2. References................................................................................................................................5 3. Decomposition Description......................................................................................................6 3.1.Module Decomposition......................................................................................................6 3.1.1. Prime List Creator.....................................................................................................6 3.1.2. Primality Tester.........................................................................................................7 3.1.3. Perfect Square List Creator.......................................................................................7 3.1.4. Perfect Squareness Tester..........................................................................................8 3.1.5. Factorization Module ...............................................................................................8 3.1.6. Matrix Processing Module........................................................................................9 3.2. Concurrent Process Decomposition..................................................................................9 3.2.1. Processes in the Prime List Creator Module.............................................................9 3.2.2. Processes in the Primality Tester Module.................................................................9 3.2.3. Processes in the Perfect Square List Creator Module.............................................10 3.2.4. Processes in the Perfect Squareness Tester Module................................................10 3.2.5 Processes in the Factorization Module.....................................................................11 3.2.6. Processes in the Matrix Processing Module............................................................11 3.3. Data Decomposition........................................................................................................11 4. Dependency............................................................................................................................12 4.1. Intermodule Dependency ...............................................................................................12 4.2. Interprocess Dependency................................................................................................12 4.3. Data Dependency............................................................................................................13 5. Interface Description..............................................................................................................13 5.1. Module Interface.............................................................................................................13 5.2. Process Interface.............................................................................................................13 6. Libraries and Protocols Will Be Used....................................................................................13 6.1. MPI.................................................................................................................................13 6.2. OpenMP..........................................................................................................................15 6.3. GMP................................................................................................................................16 7. Timetable................................................................................................................................17 1. Introduction 1.1. Purpose The aim of this project is to test the security level of the keys used in RSA encryption algorithm, by means of trying to crack them. Parallel programming techniques will be used for this purpose. The product will run on a cluster and will be developed on 'nar' which is the high performance computer in our department. This project will eliminate weak keys and this will cause security protocols to use stronger keys. In addition to these features, the user will be able to use this product to test primality and being perfect square of a big number in an efficient way. 1.2. Scope The project will include different modules working for different aims. In addition to main purpose of the product, the user will also be able to use any module of the product that serves his/her needs. The modules will serve the jobs such as creating the prime list, testing primality of a big number using the prime list, creating perfect square list, testing perfect squareness of a big number, factorizing a number using the prime list and creating a matrix that is needed by another module, and processing this matrix and producing the final answer. The main purpose of this project is to find two factors of a very big number n. However, the user will be able to use some of the modules individually for some special purposes. For example, the user can use the primality testing module to test primality of a big number, also he/she can use the perfect squareness testing module to test perfect squareness of a big number. These modules will use the prime list and the perfect square list. Thus it is needed to create these lists before using other modules and the whole project. Moreover, the user will be able to access (in a read only permission) and examine these lists by hand. This product finds two factors of the given number if it has. However, it is not guaranteed that these factors are primes. They may be composite numbers. Note that, keys that used in RSA encryption algorithm are product of exactly two prime numbers. Thus, when an RSA key is given, the output of the product will be two prime numbers. 1.3. Definitions and Acronyms Definitions : composite number : A number that is not prime. congruent : Identical. factor : A factor of a whole number is a smaller whole number which can be multiplied with another whole number to produce the first whole number. factorization : Resolution of an integer or polynomial into factors so that when multiplied together they give the integer or polynomial. perfect square : An integer that is the square of an integer. primality : The property of being a prime number. prime number : Number that can only be divided by itself and 1 without remainders. de facto : De facto is a Latin expression that means "concerning the fact" or in practice but not ordained by law. It is commonly used in contrast to “de jure” (which means "by law") when referring to matters of law, governance, or technique (such as standards) that are found in the common experience as created or developed without or contrary to a regulation. Acronyms : RSA : The algorithm was publicly described in 1977 by Ron Rivest, Adi Shamir, and Leonard Adleman at MIT; the letters RSA are the initials of their surnames. MPI : Message Passing Interface. OpenMP : Open Multi-Processing. GMP : GNU Multiple Precision. NUMA : Non-Uniform Memory Access or Non-Uniform Memory Architecture. OSI : Open System Interconnection. TCP : Transmission Control Protocol. API : Application programming interface. LIS : Language Independent Specifications. PVM : Parallel Virtual Machine. CPU : Central processing unit. 2. References http://en.wikipedia.org/wiki/Message_Passing_Interface http://en.wikipedia.org/wiki/OpenMP http://en.wikipedia.org/wiki/GNU_Multi-Precision_Library http://gmplib.org/ http://en.wikipedia.org/wiki/De_facto 3. Decomposition Description 3.1.Module Decomposition 3.1.1. Prime List Creator The product needs a prime list to work. Actually we can find prime lists from internet or other resources. However they may not be reliable. Thus we will develop a module for this purpose. This module will create and save a prime list and other modules, if they need, will use this prime list. The prime list will be created only once and saved to the disk for future use. How to create prime list : The module initially will create a small prime list before starting to work in parallel. When prime list includes enough number of primes (say 100 primes) the main program will create threads that work in parallel and update the prime list. We can try two alternative ways to distribute the numbers to threads: In the first way, if the greatest prime number in the list was k when the threads had created, and number of threads is t; ith thread will check the numbers can be represented by the formula k + 2*i + 2*t*a where a = 0, 1, 2, 3, ... To check primality of a number, the thread will try to divide the number by all primes in the prime list up to square root of the number. When any factor is found, the next number will be checked. If no factor is found, that means the number being checked is prime, and it will be added to the prime list. These threads will be killed when size of the prime list become a predetermined number (Optimum value for this number is around 500.000). In the second way, each thread will be given a block of adjacent odd numbers to check. When this block is finished, the thread will be given the next unchecked block, and the prime list will be updated in this way. 3.1.2. Primality Tester This module will test primality of a given number using the prime list. If the user wants only to check primality of a number, he/she will be able to use this module independent from the main product. If the given number is prime, the module will indicate that. Otherwise, the module will give a prime factor of the number. Note that, because this module will use the prime list, the numbers that can be tested by this module will be limited by the square of the greatest prime in the list. 3.1.3. Perfect Square List Creator The product can either use a perfect square list to work or it can be implemented in a different way that does not use any perfect square list. In the method that does not use the perfect square list, the production of square numbers is done by a predefined special type of formula. However we prefer the first method so we need a perfect square list and we will implement this module to create this list. How to create perfect square list: Perfect squares are : 0, 1, 4, 9, 16, 25, 36, 49... The difference of perfect squares are : 1, 3, 5, 7, 9, 11, 13... (n + 1)2 – n2 = n2 + 2*n + 1 – n2 = 2*n + 1 So, an efficient formula that calculates perfect squares is : n0 = 0; d0 = 1; nk+1 = nk + dk; dk+1 = dk + 2; That formula yields to the algorithm : square = 0 difference = 1 for 0 -> LIMIT add_to_list(square) square += difference difference += 2 To parallelize this module, we will run some number of processes and each of them will be given a different block of adjacent numbers. The process will at first calculate initial square and difference values according to the first number of that block. Then, the process will use the algorithm above to work more efficient. 3.1.4. Perfect Squareness Tester This module will test squareness of a given number using the perfect square list. If the user wants only to check squareness of a number, he/she will be able to use this module independent from the main product. If the given number is perfect square, the module will return the square root of the given number. Otherwise the program will indicate that the given number is not a perfect square. Note that, because this module will use the perfect square list, the numbers that can be tested by this module will be limited by the greatest perfect square in the list. 3.1.5. Factorization Module This module will take numbers from the perfect square list (say ai) and then calculate bi such that ai = bi (mod n) and 0 <= bi < n. Then it will try to factorize bi using the primes in the prime list. Factorization of each bi will be done in a single core but each core will work for a different block of ai. Thus, this job also will be done in a parallel manner. If all factors of bi are in our prime list, a new row representing the factors of bi will be added to the matrix. Otherwise, bi will be discarded and bi+1 will be tried. When perfect square list is finished, if sufficient number of rows is not reached, the perfect square list will be expanded until enough number of rows are found. Number of rows must be equal to number of primes in the prime list plus one in order to guarantee the linear dependency among the rows. 3.1.6. Matrix Processing Module By this module starts to run, there will be a binary matrix consists of t columns and t+1 rows where t is number of the primes in the prime list. The linear dependency among the rows of this matrix is guaranteed because the number of rows is greater than the number of columns. In order to find that dependency this module will use the Gaussian Elimination Method. There are various methods for parallelizing the Gaussian Elimination Method. 3.2. Concurrent Process Decomposition 3.2.1. Processes in the Prime List Creator Module There will be a number of threads or processes in this module that work together to create the prime list. Each thread or process will work on only the set of numbers that reserved for it and update the shared prime list. When prime list reaches the desired size, these threads and processes will terminate. This module will run only once and the prime list will be saved on the disk. When other modules use this list, they will access it from the disk in read only permission. 3.2.2. Processes in the Primality Tester Module To check primality of a given number, we should test divisibility of this number by the prime numbers that are less than square root of the given number. This module, firstly will calculate the integer square root of the number and the number of primes that are less than the square root. Then it will distribute nearly equal number of primes to each thread or process and wait an answer from them. If any thread or process find a factor, the main program will return this factor, terminate other processes and exit. Otherwise, if none of the threads or processes find any factor, the program will indicate that the given number is a prime. 3.2.3. Processes in the Perfect Square List Creator Module There will be a number of threads or processes in this module that work together to create the perfect square list. Each thread or process will work on only the set of numbers that reserved for it and update the shared perfect square list. When perfect square list reaches the desired size, these threads and processes will terminate. This module will run only once and the perfect square list will be saved on the disk. When other modules use this list, they will access it from the disk. However, if this list is not sufficient, the Factorization Module will call this module to expand the perfect square list. 3.2.4. Processes in the Perfect Squareness Tester Module This module will test squareness of a given number using the perfect square list. In order to do that, the module will search to given number in the perfect square list. If the number is found in the list, it is the square of its line number. If the number is not in the list, the number is not a perfect square. There will be a number of threads or processes in this module that work together. The module will firstly distribute the perfect square list on the threads and processes and each thread or process will work on only the set of numbers that reserved for it. If the given number is not between boundaries of the set, the process will directly return with an unsuccessful result. If any thread or process returns a successful result, the main program will terminate other threads and processes and indicate that the given number is a perfect square. Otherwise, the given number is not in the list, and this means the number is not a perfect square. Note that, because this module will use the perfect square list, it is limited by the greatest value of the list. 3.2.5 Processes in the Factorization Module This module will take numbers from the perfect square list (say ai) and then calculate bi such that ai = bi (mod n) and 0 <= bi < n. Then it will try to factorize bi using the primes in the prime list. Factorization of each bi will be done in a single core but each core will work for a different block of ai. The module will firstly distribute the perfect square list to threads and processes and each thread or process will calculate bi and update the matrix if all factors of bi are in the prime list. 3.2.6. Processes in the Matrix Processing Module The processing of the matrix will be done using the Gaussian Elimination Method. This work must be done in a parallel manner. There are various methods for parallelizing this method. The details of parallelizing the Gaussian Elimination Method will be given in the Detailed Design Report of the project. 3.3. Data Decomposition Prime List : This list will be kept on the disk after created, and when needed by a module, it will be loaded to the memory entirely. The number of primes in the prime list will be around 500 000 and the RSA key n that we will deal with, will be around 512 bits. Moreover, our greatest prime in the prime list, will be much smaller than the key. Assume we will have 500 000 primes and each of them is 512 bits = 64 bytes. In such a situation the size of the prime list will be 32 MB. This shows us it does not lead to a problem to hold it in the memory. Perfect Square List : This list will be kept on the disk after created, and when needed by a module, its appropriate part will be loaded to the memory. The Matrix : This will be a 500000 * 500001 binary matrix. It will be around 32 GB large. So, although it doesn't fit in the memory of one node, it can be distributed among the memories of several nodes, or it can be loaded into the shared memory area. 4. Dependency 4.1. Intermodule Dependency Module Name Dependencies Prime List Creator none Primality Tester Prime List Creator Explanation Primality Tester needs Prime List that is created by Prime List Creator. Perfect Square List Creator none Perfect Squareness Tester Perfect Square List Creator Perfect Squareness Tester needs Perfect Square List that is created by Perfect Square List Creator. Factorization Module Perfect Square List Creator, Factorization Module needs Prime List Creator Perfect Square List that is created by Perfect Square List Creator and Prime List that is created by Prime List Creator. Matrix Processing Module Factorization Module Matrix Processing needs the created by Module matrix that is Factorization Module. 4.2. Interprocess Dependency All modules in this project have internal parallelization. The common point of those parallelization is being “single-process multiple-data” type. So, there is no dependency between processes, but if one of the processes reaches a terminating condition (for example; finding an answer) , it causes all other processes to terminate. 4.3. Data Dependency There is no dependency between the Prime List and the Perfect Square list. The Matrix depends on both Prime List and Perfect Square List. The rows of the matrix are created by factoring (into the Prime List) the congruent (modulo n) of the numbers from the Perfect Square List. 5. Interface Description 5.1. Module Interface Module 1 Module 2 Interface Primality Tester Prime List Creator Prime List File Perfect Squareness Tester Perfect Square List Creator Perfect Square List File Factorization Module Perfect Square List Creator Perfect Square List File Factorization Module Prime List Creator Prime List File Matrix Processing Module Factorization Module Matrix 5.2. Process Interface Since there is no dependency among the processes, there is no need to interface. 6. Libraries and Protocols Will Be Used 6.1. MPI MPI is a language-independent communications protocol used to program parallel computers. Both point-to-point and collective communication is supported. MPI is a message-passing application programmer interface, together with protocol and semantic specifications for how its features must behave in any implementation. MPI's goals are high performance, scalability, and portability. MPI remains the dominant model used in high-performance computing today. MPI is not sanctioned by any major standards body; nevertheless, it has become a “de facto” standard for communication among processes that model a parallel program running on a distributed memory system. Actual distributed memory supercomputers such as computer clusters often run these programs. The principal MPI-1 model has no shared memory concept, and MPI-2 has only a limited distributed shared memory concept. Nonetheless, MPI programs are regularly run on shared memory computers. Designing programs around the MPI model (as opposed to explicit shared memory models) has advantages on NUMA architectures since MPI encourages memory locality. Although MPI belongs in layers 5 and higher of the OSI Reference Model, implementations may cover most layers of the reference model, with socket and TCP being used in the transport layer. Most MPI implementations consist of a specific set of routines (i.e., an API) callable from Fortran, C, or C++ and from any language capable of interfacing with such routine libraries. The advantages of MPI over older message passing libraries are portability (because MPI has been implemented for almost every distributed memory architecture) and speed (because each implementation is in principle optimized for the hardware on which it runs). MPI has Language Independent Specifications (LIS) for the function calls and language bindings. The first MPI standard specified ANSI C and Fortran-77 language bindings together with the LIS. There are two versions of the standard that are currently popular: version 1.2 (shortly called MPI-1), which emphasizes message passing and has a static runtime environment, and MPI-2.1 (MPI-2), which includes new features such as parallel I/O, dynamic process management and remote memory operations. MPI-2's LIS specifies over 500 functions and provides language bindings for ANSI C, ANSI Fortran (Fortran90), and ANSI C++. Interoperability of objects defined in MPI was also added to allow for easier mixed-language message passing programming. A side effect of MPI-2 standardization (completed in 1996) was clarification of the MPI-1 standard, creating the MPI-1.2 level. It is important to note that MPI-2 is mostly a superset of MPI-1, although some functions have been deprecated. Thus MPI-1.2 programs still work under MPI implementations compliant with the MPI-2 standard. MPI is often compared with PVM, which is a popular distributed environment and message passing system developed in 1989, and which was one of the systems that motivated the need for standard parallel message passing systems. Threaded shared memory programming models (such as Pthreads and OpenMP) and message passing programming (MPI/PVM) can be considered as complementary programming approaches, and can occasionally be seen used together in applications where this suits architecture, e.g. in servers with multiple large sharedmemory nodes. 6.2. OpenMP OpenMP is an implementation of multi-threading, a method of parallelization whereby the master thread (a series of instructions executed consecutively) forks a specified number of slave threads and a task is divided among them. The threads then run concurrently, with the runtime environment allocating threads to different processors. The section of code that is meant to run in parallel is marked accordingly, with a preprocessor directive that will cause the threads to form before the section is executed. Each thread has an ID attached to it. The thread ID is an integer, and the master thread has an ID of 0. After the execution of the parallelized code, the threads join back into the master thread, which continues onward to the end of the program. By default, each thread executes the parallelized section of code independently. "Work-sharing constructs" can be used to divide a task among the threads so that each thread executes its allocated part of the code. Both Task parallelism and Data parallelism can be achieved using OpenMP in this way. The runtime environment allocates threads to processors depending on usage, machine load and other factors. The number of threads can be assigned by the runtime environment based on environment variables or in code using functions. The OpenMP functions are included in a header file labeled "omp.h" in C/C++. 6.3. GMP GMP is a free library for arbitrary precision arithmetic, operating on signed integers, rational numbers, and floating point numbers. There is no practical limit to the precision except the ones implied by the available memory in the machine GMP runs on. GMP has a rich set of functions, and the functions have a regular interface. The main target applications for GMP are cryptography applications and research, Internet security applications, algebra systems, computational algebra research, etc. GMP is carefully designed to be as fast as possible, both for small operands and for huge operands. The speed is achieved by using full words as the basic arithmetic type, by using fast algorithms, with highly optimized assembly code for the most common inner loops for a lot of CPUs, and by a general emphasis on speed. 7. Timetable Months To Do - Decide on details of the design of the project. December - Implement simple parallel programs to get experience on MPI OpenMP and parallel programming concept. - Prepare initial and detailed design reports. January - Preparation of the prototype demo. - Semester Holiday. February - Start to implement the prime list creator module. - Start to implement the perfect square list creator module. - Finish implementation of the prime list creator module. March - Finish implementation of the perfect square list creator module. - Create the prime list and the perfect square list. April - Start to implement the Factorization Module. - Start to implement the Matrix Processing Module. May - Finish all implementations. - Work on debugging, profiling and optimization. - Finalize the project. June - Show that the project is working correctly. - Prepare Final Presentation.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Initial Design Report - CENG 490 Design Project