Download Lecture 18 - Brown University

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Reconfigurable Computing
(EN2911X, Fall07)
Lecture 18: Application-Driven
Hardware Acceleration (4/4)
Prof. Sherief Reda
Division of Engineering, Brown University
http://ic.engin.brown.edu
Reconfigurable Computing
S. Reda, Brown University
Status
• We have covered popular application-driven hardware
acceleration using reconfigurable computing
– FFT for signal and image processing as an example
of divide and conquer algorithms
– Speech recognition applications
– Viterbi algorithm for digital communication as an
example of dynamic programming algorithms
• This lecture we overview some of the algorithms for
bioinformatics
Reconfigurable Computing
S. Reda, Brown University
Quick introduction to molecular biology &
bioinformatics
Reconfigurable Computing
S. Reda, Brown University
DNA
• Can be thought of as the “blueprint” for an
organism
• Composed of small molecules called
nucleotides
– four different nucleotides distinguished by the four
bases: adenine (A), cytosine (C), guanine (G) and
thymine (T)
• DNA is digital information
• A single strand of DNA can be thought of as a
string composed of the four letters: A, C, G, T
ACGTTCTA
• DNA molecules usually consist of two strands
arranged in a double helix structure where A
bonds to T and C bonds to G
Reconfigurable Computing
S. Reda, Brown University
Genes
• Genes are the basic units of heredity
• A gene is a sequence of bases that carries the
information required for constructing a particular protein.
Such a gene is said to encode a protein
• The human genome comprises ~ 20K-25K genes
• Those genes encode > 100,000 proteins
Reconfigurable Computing
S. Reda, Brown University
Proteins
a folded protein structure
amino acids
• Proteins perform most life functions and even make up the majority
of cellular structures.
• Proteins are large, complex molecules made up of smaller subunits
called amino acids.
• Chemical properties that distinguish the 20 different amino acids
cause the protein chains to fold up into specific three-dimensional
structures that define their particular functions in the cell.
• Proteins can be thought of as a string composed from a 20character alphabet
Reconfigurable Computing
S. Reda, Brown University
Central dogma of molecular biology
• RNA is like DNA except that they are usually single stranded and
the base uracil (U) is used in place of thymine (T)
• a strand of RNA can be thought of as a string composed of the four
letters: A, C, G, U
Reconfigurable Computing
S. Reda, Brown University
Translation
Reconfigurable Computing
S. Reda, Brown University
Translation
• There are possible 6 reading frames in translating DNA
sequences into proteins.
• In many cases, FPGAs are used to translate a DNA sequence
into the 6 frames in parallel and then concurrently apply any
subsequent processing
Reconfigurable Computing
S. Reda, Brown University
DNA string alignment
• A sequence alignment is a way of arranging the primary sequences of
DNA (or RNA or protein) to identify regions of similarity that may be a
consequence of functional, structural, or evolutionary relationships
between the sequences.
• If two sequences in an alignment share a common ancestor,
mismatches can be interpreted as point mutations and gaps as
insertion or deletion mutations introduced in one or both lineages in the
time since they diverged from one another.
• At each position, one of three cases can occur:
• A match occurs when the same character is present in both strings
• A mismatch, or substitution, when there are two different characters
• A gap, where is an insertion of one character in only one string, or
symmetrically a deletion in the other string
• How can we find the best alignment between two DNA strings?
Reconfigurable Computing
S. Reda, Brown University
Finding the best global alignment
[Figures from slides 11-14 from Bioinformatics
Applications by D. Lavenier and M. Giraud]
Costs:
• +4 for a match
• -2 for a mismatch
• -3 for a gap
Needleman and Wunsch (NW) dynamic programming algorithm
Reconfigurable Computing
S. Reda, Brown University
Local alignment: finding the most similar
subsequences
Costs:
• +4 for a match
• -2 for a mismatch
• -3 for a gap
Smith and Waterman (SW algorithm)
Reconfigurable Computing
S. Reda, Brown University
Dynamic programming advantage on FPGAs
• All cells on a same anti-diagonal can be computed simultaneously
• What is the runtime on a general purpose CPU?
• What is the runtime on an FPGA?
Reconfigurable Computing
S. Reda, Brown University
Required number of computational cells
Reconfigurable Computing
S. Reda, Brown University
Examples of commercial products
• Bioceleration Ltd.
• Each BioXL/H board contains eight FPGA modules and 128MB of
global memory.
• Each of the modules is programmed to calculate four matrix cells per
clock cycle (for the Smith-Waterman algorithm). An eight-board
BioXL/H executes these applications at a speed of 6 billion matrix
cells per second.
• The clock rate of the system is 25-33MHz (programmable).
• Examples of applications supported:
• Smith-Waterman algorithm
• Translation of nucleic acid sequences to 6 reading frames and
search frame into an amino acid database
Reconfigurable Computing
S. Reda, Brown University
More examples: TimeLogic
“CodeQuest is a biocomputing workstation that processes large
genomics searches and sophisticated informatics workflows. Using its
FPGA-based DeCypher Engines, the quad-core CodeQuest
workstation speeds Tera-BLAST, Smith-Waterman, Hidden Markov
Model (HMM) and gene modeling searches at the speed of a midsized cluster.”
“It brings several fold the performance of a 64-CPU cluster, yet costs
less than 10 CPUs”
Reconfigurable Computing
S. Reda, Brown University
Related documents