Download Lecture 18 - Brown University

Reconfigurable Computing (EN2911X, Fall07) Lecture 18: Application-Driven Hardware Acceleration (4/4) Prof. Sherief Reda Division of Engineering, Brown University http://ic.engin.brown.edu Reconfigurable Computing S. Reda, Brown University Status • We have covered popular application-driven hardware acceleration using reconfigurable computing – FFT for signal and image processing as an example of divide and conquer algorithms – Speech recognition applications – Viterbi algorithm for digital communication as an example of dynamic programming algorithms • This lecture we overview some of the algorithms for bioinformatics Reconfigurable Computing S. Reda, Brown University Quick introduction to molecular biology & bioinformatics Reconfigurable Computing S. Reda, Brown University DNA • Can be thought of as the “blueprint” for an organism • Composed of small molecules called nucleotides – four different nucleotides distinguished by the four bases: adenine (A), cytosine (C), guanine (G) and thymine (T) • DNA is digital information • A single strand of DNA can be thought of as a string composed of the four letters: A, C, G, T ACGTTCTA • DNA molecules usually consist of two strands arranged in a double helix structure where A bonds to T and C bonds to G Reconfigurable Computing S. Reda, Brown University Genes • Genes are the basic units of heredity • A gene is a sequence of bases that carries the information required for constructing a particular protein. Such a gene is said to encode a protein • The human genome comprises ~ 20K-25K genes • Those genes encode > 100,000 proteins Reconfigurable Computing S. Reda, Brown University Proteins a folded protein structure amino acids • Proteins perform most life functions and even make up the majority of cellular structures. • Proteins are large, complex molecules made up of smaller subunits called amino acids. • Chemical properties that distinguish the 20 different amino acids cause the protein chains to fold up into specific three-dimensional structures that define their particular functions in the cell. • Proteins can be thought of as a string composed from a 20character alphabet Reconfigurable Computing S. Reda, Brown University Central dogma of molecular biology • RNA is like DNA except that they are usually single stranded and the base uracil (U) is used in place of thymine (T) • a strand of RNA can be thought of as a string composed of the four letters: A, C, G, U Reconfigurable Computing S. Reda, Brown University Translation Reconfigurable Computing S. Reda, Brown University Translation • There are possible 6 reading frames in translating DNA sequences into proteins. • In many cases, FPGAs are used to translate a DNA sequence into the 6 frames in parallel and then concurrently apply any subsequent processing Reconfigurable Computing S. Reda, Brown University DNA string alignment • A sequence alignment is a way of arranging the primary sequences of DNA (or RNA or protein) to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. • If two sequences in an alignment share a common ancestor, mismatches can be interpreted as point mutations and gaps as insertion or deletion mutations introduced in one or both lineages in the time since they diverged from one another. • At each position, one of three cases can occur: • A match occurs when the same character is present in both strings • A mismatch, or substitution, when there are two different characters • A gap, where is an insertion of one character in only one string, or symmetrically a deletion in the other string • How can we find the best alignment between two DNA strings? Reconfigurable Computing S. Reda, Brown University Finding the best global alignment [Figures from slides 11-14 from Bioinformatics Applications by D. Lavenier and M. Giraud] Costs: • +4 for a match • -2 for a mismatch • -3 for a gap Needleman and Wunsch (NW) dynamic programming algorithm Reconfigurable Computing S. Reda, Brown University Local alignment: finding the most similar subsequences Costs: • +4 for a match • -2 for a mismatch • -3 for a gap Smith and Waterman (SW algorithm) Reconfigurable Computing S. Reda, Brown University Dynamic programming advantage on FPGAs • All cells on a same anti-diagonal can be computed simultaneously • What is the runtime on a general purpose CPU? • What is the runtime on an FPGA? Reconfigurable Computing S. Reda, Brown University Required number of computational cells Reconfigurable Computing S. Reda, Brown University Examples of commercial products • Bioceleration Ltd. • Each BioXL/H board contains eight FPGA modules and 128MB of global memory. • Each of the modules is programmed to calculate four matrix cells per clock cycle (for the Smith-Waterman algorithm). An eight-board BioXL/H executes these applications at a speed of 6 billion matrix cells per second. • The clock rate of the system is 25-33MHz (programmable). • Examples of applications supported: • Smith-Waterman algorithm • Translation of nucleic acid sequences to 6 reading frames and search frame into an amino acid database Reconfigurable Computing S. Reda, Brown University More examples: TimeLogic “CodeQuest is a biocomputing workstation that processes large genomics searches and sophisticated informatics workflows. Using its FPGA-based DeCypher Engines, the quad-core CodeQuest workstation speeds Tera-BLAST, Smith-Waterman, Hidden Markov Model (HMM) and gene modeling searches at the speed of a midsized cluster.” “It brings several fold the performance of a 64-CPU cluster, yet costs less than 10 CPUs” Reconfigurable Computing S. Reda, Brown University

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Lecture 18 - Brown University