Download 1447437435_Sequence alignment GPU

ACCELERATING PAIRWISE SEQUENCE ALIGNMENT USING GPU: REVIEW ARTICLE 2 ABSTRACT Sequence alignment algorithms are used in comparisons of sequences and identifying origins of it. Various techniques are used for aligning sequences such as Dynamic Programming which gives accurate results but is slow. Besides, Heuristics methods which is faster but imprecise. So, accelerating alignment algorithms is a vital to be able for mining and analyzing large sequencing databases with accurate result. Many hardware methods are used for acceleration as FPGA and GPU. In this paper we concentrate on acceleration of sequence alignment algorithms using GPU. KEY WORDS: Bioinformatics, Sequence alignment, GPU. 1. INTRODUCTION : Bioinformatics is a mixture field of biological and information sciences, which study methods of storing, retrieving and analyzing biological data. Biological data include nucleic acid (Deoxy ribonucleic acid “DNA” or Ribonucleic acid “RNA”) sequence and protein sequence. Proteins, DNA and RNA are coded by sequence of characters. The comparison of biological sequences is one of the major problems in Bioinformatics. It can be achieved by aligning the sequences to find out the match characters and this operation called sequence alignment. Mainly, alignment process is used to find the similarity between biological sequences. Also, it may be used to find origin of a query sequences. There are two kinds of sequence alignment Pairwise sequence alignment and Multiple sequence alignment. Pairwise alignment consist of aligning two sequences but Multiple alignment align more than two sequences. Dynamic programming and heuristics methods are used to compute sequence alignment. Dynamic programming (DP) methods find all optimal solutions of alignment but are slow. In contrast, heuristic methods find alignments with less precision but faster more than DP. Graphical Processing Unit (GPU) was used to accelerate sequence alignment algorithms by executing it in parallel manner. In this paper we are interested in pairwise alignments and pairwise alignment algorithms using Graphics Processor Units (GPUs) are presented. The rest of the paper is organized as follows : In section 3, we present the three types of pairwise alignments algorithms. In section 4, we give an overview on GPUs. In section 5, we show how GPUs can be used to accelerate pairwise alignment algorithms. Finally, in the last section, we present the conclusion. 3 2. PAIRWISE ALIGNMENT ALGORITHMS There are three types of pairwise alignments algorithms: pairwise global alignment, pairwise local alignment and pairwise semi-global alignment. Let us begin by pairwise local alignment algorithms. 2.1 Pairwise Local Alignment Algorithms The most used Dynamic Programming (DP) algorithm for pairwise local alignment is the one of Smith and Waterman [1]. The main difference with the algorithm of Needleman and Wunsch [2] is that any cell of the matrix M can be considered as a starting point for the calculation of the scores and that any score which becomes lower than zero stops the progression of the calculation of the scores. The associated cell is then reinitialized to zero and can be considered as a new starting point. Equation 1 used for the calculation of each score during the transformation of the initial matrix is the following:  se(i, j )  M [i  1, j  1],     se(i, j )  max( M [ x, j  1]  P),  M [i, j ]  max se(i, j )  max( M [i  1, y ]  P),    0    (1) where i + 2 < x ≤ m and j + 2< y ≤ n, se is the score between the character at position i in S1 and the one at position j in S2 and P is a gap penalty. With m and n are respectively the lengths of the sequences S1 and S2 to align. Time complexity of Smith and Waterman algorithm is O(m*n). 2.2 Pairwise Global Alignment Algorithms A pairwise global alignment involves the alignment of two entire sequences. There are two main approaches to construct a pairwise global alignment: (i) Dynamic programming approach [3,4]: The most used dynamic programming algorithm for pairwise global alignment is the one of Needleman and Wunsch [2]. By using this algorithm, the construction of a pairwise global alignment of two sequences S1 and S2, with respective lengths m and n, is performed in two steps: (i.1) During the first step, we construct a matrix M of size m*n and we initialize it by using a substitution matrix, e.g., PAM (Percent Accepted Mutations) [5], BLOSUM (BLOcks 4 SUbstitution Matrix) [6]. Then, we transform matrix M by adding scores line by line, starting by the right lower cell and ending by the left upper one, by using the following equation: M[i,j] = se(i,j) + max(M[x,y]) (2) where x = i + 1 and j < y ≤ n, or i < x ≤ m and y = j + 1, and se is the score between the character at position i in S1 and the one at position j in S2. We can also incorporate in the equation a gap penalty. A gap is a character, e.g., ‘ - ‘, inserted in aligned sequences so that aligned characters are found in front of each other. It is sufficient to subtract from the calculation of every sum a penalty according to their position. So, equation 2 becomes:  M [i  1, j  1],    M [i, j ]  se(i, j )  max M [ x, j  1]  P,   M [i  1, y ]  P    (3) where i + 2< x ≤ m and j + 2 < y ≤ n, and P is a gap penalty. The gap penalty P can have several possible forms as shown in Table 1. Where k is the number of successive gaps and a, b and c are constants. (i.2) During the second step, we establish a path in the matrix, called maximum scores path, which leads to an optimal pairwise global alignment. The construction of this path is achieved by starting from the cell that contains the maximum score in the transformed matrix, which corresponds normally to the leftmost upper cell, and allowing three types of possible movements: (a) Diagonal movement: This movement corresponds to the passage from a cell (i,j) to a cell (i+1,j+1). (b) Vertical movement: This movement corresponds to the passage from a cell (i,j) to a cell (i+1,j). (c) Horizontal movement: This movement corresponds to the passage from a cell (i,j) to a cell (i,j+1). Time complexity of Needleman and Wunsch algorithm is O(m*n). Other dynamic programming algorithms for pairwise global alignment exist, such as the one of Huang and Chao [7] and NGILA [8]. 5 2.3 Pairwise Semi-Global Alignment A pairwise semi-global alignment is like pairwise global alignment but ignoring start gaps, i.e. gaps that occur before the first character in a sequence, and end gaps, i.e., gaps that occur after the last character of a sequence. An overlap of two sequences is considered an alignment where start and end gaps are ignored. One used DP algorithm for pairwise semi-global alignment is the one of Sodiya et al. [6]. To align two sequences A and B, |A|=m and |B|=n, this algorithm begins by initializing an (m+1)*(n+1) M matrix. Starting from cell M[0,0], it iterates through each cell whose value is determined through a choice of three transitions to that cell: • Diagonal step: Indicates an alignment between the (i−1) symbol in A with the (j−1) symbol in B. The alignment score added to the value of M[i-1, j-1] measures the level of alignment of the symbols deﬁned in the scoring system, denoted as diagonal. • Vertical step: Indicates the insertion of a gap into A, and alignment of the gap with the (j −1) symbol in B. The gap penalty is added to the value of the M[i,j-1], denoted as top. The gap penalty for this transition is dependent on the scoring system used. • Horizontal step: Indicates the insertion of a gap into B, and alignment of the gap with the (i-1) symbol in A. The gap penalty is added to the value of M[i-1,j], denoted as left. As with the vertical step, the gap penalty for this transition is dependent on the scoring system used. Time complexity of SemiGlobal algorithm O(m*n). 3. GRAPHICS PROCESSOR UNITS Graphics processing unit is a device that was designed to accelerate the computation of Graphics operations. GPU is basically an electronic device consisting of many processors with memory. It was used to accelerate image building for output to a display unit. GPU has high memory bandwidth and high computational capabilities and higher parallel structure that makes it useful as a general purpose unit for other applications rather than any imaging applications like computational biology. A GPU consists of many multiprocessors and a large Dynamic RAM (DRAM). Each multiprocessor is coupled with a small cache memory and consists of a large number of cores, i.e., Arithmetic Logical Units (ALUs), controlled by a control unit. Comparison between GPU and CPU is as shown in Fig 1. A CPU consists of a small number of ALUs and a large DRAM and a single large cache memory. GPU has large number of ALUs 6 (Arithmatic Logic Unit) that give it high computational power in performing data parallel computations faster than CPU. GPU has small cache memory and control unit for each set of ALUs. Also GPU has a higher bandwidth between memory and processing elements (ALUs) that make it performing operations faster. GPUs are well adopted to solve problems with data-parallel processing, i.e., the same program is executed on many data elements in parallel. Data-parallel processing maps data elements to parallel processing threads. A thread is a sequence of instructions that may be executed in parallel with each other. Data-parallel processing is an efficient way to accelerate many algorithms. GPUs were originally designed to accelerate computer graphics algorithms. With GPUs impressive speedups can be achieved using a programming model that is simpler than the one required for FPGAs. However, their high computational capabilities and their highly parallel structure opened up to them a wide a range of other fields, like scientific computing [10], computational geometry [11] and bioinformatics [12]. GPU memory consists of an on – chip memory with processors like shared , local memory and registers and off chip memory which is separated from processors like global, constant and texture memory[23]. There are differnet access time to different memory that is as shown in Fig 4. This Fig show access time to each memory of GPU by processors where global memory has the highest access time around 400 – 600 cycle. Access time for constant cache, texture cache and shared memory approximately is the same as access time to registers 1 cycle. 4. ACCELERATING PAIRWISE ALIGNMENT ALGORITHMS With the new sequencing technologies, the number of biological sequences in databases, like GenBank [13] and PubMed [14], is increasing exponetionally. In addition, the length of one of these sequences is large, hundreds of bases. Hence, comparing, via a pairwise alignment algorithm, a query sequence to the sequences of one of theses database becomes expensive in computing time and memory space. So there is need to accelerate pairwise alignment algorithms . Many hardware solutions to accelerate the alignment process such as Multiprocessor Architecture, FPGA and GPU. But in this paper we focus on GPU. 7 GPU programming is based on OpenGL [15] and now on CUDA parallel programming languages which is the computing engine in NVIDIA Graphics processing units [23]. So, we have two types of implementations of SW algorithm on GPUs : Either by using OpenGL parallel programming language or by using CUDA one. 4.1 Accelerating Smith and Waterman Algorithm by Using OpenGL The first implementations of SW algorithm on GPUs are described in [16]. These implementations are based on similar approaches and use OpenGL [15]. They operate as follows: First the database and the query sequence are copied to the GPU memory as textures [9]. The score matrix is then processed in an anti-diagonal way. For each element of the current antidiagonal, a pixel is drawn. Drawing this pixel executes a small program, called pixel shader, that computes the score for the cell. The results are on a texture, which is then used as input for the next pass. The implementation of [15] searched 99,8% of Swiss-Prot database, now merged into the Universal Protein Resource (UniProt) database [17], and obtained a maximum speed of 650 Mega Cell Updates Per Second (MCUPS) compared to around 75 for the compared CPU version. The Cell-Updates Per Second (CUPS) is defined as follows : CUPS = query sequence length * database size run time (1.4) The implementation of [18] has two versions, the first one is with traceback and the second one is without. Both versions were benchmarked by using a Geforce 7800 GTX GPU and executed on a database of just 983 sequences. The version without traceback obtained a maximum speed of 241 MCUPS, compared to 178 with traceback and 120 for the compared CPU version. 4.2 Accelerating Smith and Waterman Algorithm by Using CUDA SW-CUDA [19] is used CUDA for sequence alignment implementation on GPU . Each GPU thread computes the whole alignment of the query sequence with one database sequence. The threads are grouped in a grid of blocks during execution. In order to make the most efficient use of the GPU resources , the computing time of all the threads in the same grid must be as near as possible. 8 Experimental studies have been done to compare SW- CUDA running on two Geforce 8800 GTX with BLAST [20] and SSEARCH [21] , running on a 3 GHz Intel Pentium IV processor. The execution times of CUDA implementation were up to 30 times faster than SSEARCH and up to 2.4 times faster than BLAST. SW- CUDA was also 3 times faster than Single Instruction Multiple Data (SIMD) implementation [24]. SW – CUDA needed to be improved to utilize the full resources on the GPU because it use local memory of GPU which is the slowest memory resource on GPU. An effective development is proposed that using on-chip shared memory that reducing amount of data transfer from global memory to processing elements in a GPU [19]. That gives result of reducing data fetch amount to 1 / 140. An implementation proposed by Striemer [25] is 23 times speed than SSEARCH. SW computation matrix are computed purely using on GPU in Striemer’s Implementation. It works in 3 stages, 1) load biological sequences database to GPU’s global memory. 2) Each thread used to compute alignment score between a query sequence and each sequence in a database. 3) Alignment scores back to CPU to determine highest alignment score. So this implementation used to give alignment score only not the alignment. Unlike SW - CUDA, this implementation don’t involve using of CPU for partial SW computations, it were done purely using GPU. Another difference that this implementation use constant memory of GPU to save query sequence and substitution matrix because constant cache has access time as register access time. SW-CUDA achieves speeds of more than 3.5 GCUPS on a workstation running two GeForce 8800 GTX GPUs. SW-CUDA was also compared to Single Instruction Multiple Data (SIMD) implementation [22]. The expeimental results show that SW-CUDA performs from 2 to 30 times faster than any other previous implementation. CONCLUSION Biological databases’ size is increasing exponetionally in addition to size of sequences is large. So, biological computation processes such as sequences alignment are needed to be faster. Many hardware solutions are used to accelerate theses algorithms such as FPGA, GPU. Also, sequence alignment is used in other biological functions such as Gene Tracer Algorithm [24] that find out the ancestors of an offspring from a biological database. This algorithm also will be accelerated using GPU. 9 REFERENCES [1] T. F. Smith, M. S. Waterman, Identification of common molecular subsequences, J. Molecular Biology, N°147: (1981), p195–197. [2] C. S. B Needleman, C. D. Wunsch, “A general method applicable to the search for similarities in the amino acid sequence of two proteins”. Journal of molecular biology, vol. 48, no. 1, pp. 443-453. 1970. [3] : R. Bellman Dynamic Programming. Princeton University Press, Princeton, New Jersey, 1957 [4]: R. Bellman, S. Dreyfus, Applied Dynamic Programming, Princeton University Press (1962). [5] : M. O. Dayhoff, R. M. Schwartz, B. C. Orcutt, A model of evolutionary change in proteins, in Atlas of Protein Sequence and Structure, chapter 22, National Biomedical Research Foundation, Washington, DC: (1978), p345–358. [6] : S. Henikoff, J. G. Henikoff, Amino acid substitution matrices from protein blocks, Proc. Natl. Acad. Sci. USA, Vol. 89, N°22 : (1992), p10915-10919. [7] : X. Huang, K. M. Chao, A generalized global alignment algorithm, Bioinformatics, Vol. 19, N°2: (2003), p228–233. [8] : R. A. Cartwright: NGILA: Global pairwise alignments with logarithmic and affine gap costs, Bioinformatics, Vol. 23, N°11 : (2007), p1427–1428. [9] : J. D. Owens, M. Houston, D. Luebke, S. Green, J. E. Stone, and J. C. Phillips, GPU Computing, Proceedings of the IEEE, Vol. 96, No. 5 : (May 2008). [10] : J. Krüger and R. Westermann, Linear algebra operators for GPU implementation of numerical algorithms, ACM Transactions on Graphics (TOG), in Proc ACM SIGGRAPH’03, Vol. 22, Issue 3 : (July 2003), p908–916. [11] : P. Agarwal, S. Krishnan, N. Mustafa, and S. Venkatasubramanian. Streaming geometric optimization using graphics hardware, in Proc. 11th Annual European Symposium on Algorithms (ESA’03), Budapest, Hungary : (September, 2003). [12] : M. Charalambous, P. Trancoso, and A. Stamatakis. Initial experiences porting a bioinformatics application to a graphics processor, in Proc. 10th Panhellenic Conference on Informatics (PCI’05), Volos, Greece : (November 2005). [13] : GenBank. http://www.ncbi.nlm.nih.gov/genbank/ 10 [14] : PubMed. http://www.ncbi.nlm.nih.gov/pubmed/ [15] : D. Shreiner, M. Woo, J. Neider, T. Davis, OpenGL Programming Guide, 5th edition. Reading, MA: Addison-Wesley (Publish.) : (August) 2005. [16] : W. Liu, B. Schmidt, G. Voss, A. Schroder, and W. Muller-Wittig, Bio-Sequence Database Scanning on a GPU, in Proc. 20th IEEE International Parallel & Distributed Processing Symposium (IPDPS’06), 5th IEEE International Workshop on High Performance Computational Biology) Workshop (HICOMB’06), Rhode Island, Greece : (2006). [17] : UniProt. http://www.uniprot.org/ [18] : Y. Liu, W. Huang, J. Johnson, S. Vaidya, GPU accelerated Smith-Waterman, in Proc. Computational Science (ICCS’06), Lecture Notes in Computer Science, Vol. 3994, Springer-Verlag, Berlin, Germany : (2006), p188-195. [19] : S. A. Manavski and G. Valle, CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment, BMC Bioinformatics, Vol. 9 (Suppl 2), S10 : (2008). [20] : W. R. Pearson, D. J. Lipman, Improved tools for biological sequence comparison, Proc Natl Acad Sci U S A, Vol. 85: (April 1988), p2444-2448 [21] : W. R. Pearson, Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms, Genomics, Volume 11, Issue 3 : (November 1991), p635-650. [22] : M. Farrar, Striped Smith-Waterman speeds database searches six times over other SIMD implementations, Bioinformatics, Vol. 23, Issue 2 : (2007), p156-161. [23] Nvidia CUDA parallel programming book. [24] M.issa, A.Alzohairy,H.Abo bakr and I.Ziedan,“Tracking Genes Modifications in the Pedigree through GeneTracer Algorithm”, International Workshop on Database and Expert Systems Applications (DEXA),IEEE, 2012 23rd . [25] Gregory M. Striemer and Ali Akoglu ,” Sequence Alignment with GPU: Performance and Design Challenges”, IEEE Xplore,2009. 11 Table 1 Gap penalties Linear gap penalty P = a*k Affine gap penalty P = a*k + c Logarithmic gap penalty P = b*log k + c Logarithmic-affine gap penalty P = a*k + b*log k + c Fig 1. CPU organization versus GPU one [23] Fig 2. Memory bandwidth for CPU and GPU [23] 12 Fig 3. FLOPS for CPU and GPU [23]. Fig 4 : Access time between processors and different memory of GPU device [23,25].

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download 1447437435_Sequence alignment GPU