Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Infernal-GPU: CUDA-Accelerated RNA Alignment Adam Bazinet Infernal • Infernal: "INFERence of RNA ALignment" • A software package for searching DNA sequence databases for RNA structure and sequence similarities • Written and maintained by the Sean Eddy laboratory at Janelia Farm Ribonucleic Acid (RNA) • Much like DNA, RNA consists of nucleotides (A, C, G, U) • Unlike DNA, RNA is usually single-stranded • RNA molecules play a central role in many cellular processes mRNA • Messenger RNA (mRNA) -Carries information about a protein to the ribosome ncRNA • Non-coding RNA (ncRNA) -a functional RNA molecule that is not translated into a protein • Many different sub-types; two of the most abundant are ribosomal RNA (rRNA) and transfer RNA (tRNA) • Infernal is primarily concerned with these functional, non-coding RNAs Secondary Structure An example of an RNA stem-loop secondary structure • The functional forms of single-stranded RNA molecules require a specific tertiary structure, the scaffold of which is provided by secondary structural elements Secondary Structure • Small subunit ribosomal RNA, 5' domain taken from the Rfam database RNA Multiple Alignment RNA Folding Algorithms • Use a dynamic programming algorithm to computationally predict secondary structure according to a thermodynamic model • Drawbacks: -only call 50-70% of base pairs correctly, on average -can be very computationally intensive (i.e., slow) • Excellent summary article by Sean Eddy -Nature Biotechnology 22, 1457 - 1458 (2004) Infernal • Takes an RNA multiple alignment as input, and secondary structure must be provided! -the secondary structure is often determined in the laboratory • Builds a covariance model (CM) from it • Searches a target sequence database for possible matches to the input model Covariance Model • CMs are a type of stochastic context-free grammar (SCFG) • Each residue in the query RNA is represented by a state, arranged in a tree-like structure that mirrors the secondary structure of the RNA, along with additional states to model insertions and deletions • Dynamic programming calculates the probability that a substructure of the query rooted at state v aligns to a subsequence i..j in the target sequence Covariance Model Computational Complexity • The most noteworthy limitation of SCFGs is their computational complexity • SCFG-based RNA analysis algorithms require time and memory proportional to at least L3 (where L is the sequence length), because every possible pair of residues (L2) must be tried against up to L/2 base‐pairing states in the model (and in most RNA SCFGs, the time required more typically scales as L4) • The latest version of Infernal incorporates some heuristics to ameliorate the situation, but the computational cost can still be considerable Accelerating Infernal • There are two programs that would benefit the most from speedup: -cmcalibrate (part of model building) -cmsearch (database searching) • Both use a banded version of the ‘Inside’ algorithm, which is nearly identical to the Cocke-Younger-Kasami (CYK) database search dynamic programming algorithm for CMs • CYK returns the optimal derivation, whereas Inside returns the probability of the observation Banded CYK Algorithm Profiling Infernal • • Used a short test run of cmsearch as a test case • FastIInsideScan is optimized for the CPU, so there were 13 blocks of ILogsum calls - each ~25% of runtime was in FastIInsideScan, and ~22% of runtime was in ILogsum of which is a potential target for parallelization Parallelizing Infernal • • • • Each block of ILogsum calls was inside a loop • Answer: with 22 billion kernel invocations, the overhead of invoking the kernel was greater than the work the kernel was actually doing! Assigned each loop iteration to a separate GPU thread Ensured there were no redundant memory transfers However, the GPU version was ~9x slower than the optimized CPU version – why? Parallelizing Infernal • Switched to working with RefIInsideScan, a simpler, non-optimized reference implementation • Saw an opportunity for parallelization at the level of the ‘v-loop’ (loop over CM states) • The v-loop was 229 iterations, each of which was assigned to a separate GPU thread • v-loop was nested inside the j-loop (loop over database sequence positions) – j-loop was ~17,000 iterations, which means far fewer kernel invocations than in FastIInsideScan Parallelizing Infernal • Even after moving all memory transfers outside the j-loop, the program still ran ~7x slower than the reference CPU program • Best current explanation is that the kernel is not optimized – there are large numbers of incoherent reads/writes • Perhaps with additional work, a speedup can be attained – source code is available: http://www.cbcb.umd.edu/~pknut777/ Takeaways • It was difficult to dive into complex scientific code and attempt to parallelize it • Spent a LOT of time profiling the application, determining the extents of host arrays, chasing down runtime errors, etc. • Very much enjoyed learning about this unique problem area of bioinformatics Acknowledgments • Eric Nawrocki, a graduate student who develops Infernal in the Eddy Lab, provided helpful information along the way • Many thanks, Eric! Questions?