* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Download September 2010 - NVidia GPU Technology Conference
Survey
Document related concepts
Transcript
SQL vs GPU Richard Wilton Johns Hopkins University Department of Physics and Astronomy 1 Rationale • Terabyte-scale data is best managed in a DBMS • Some classes of computation are best implemented using a GPU 2 How to do GPU computation in a terabytescale database • Do the computation “outside” the database (export the data / compute / re-import the data) • Do the computation “inside” the database • Something in between 3 SQLCLR out-of-process server The basic concept • Implement computational functionality in a separate process from the SQL Server process • Access that functionality using interprocess communication (IPC) SQL Server SQL code SQLCLR procedure or function Out-of-process server IPC Special-case functionality Why? • Avoid memory, threading, and permissions restrictions on SQLCLR implementations • Load dynamic-link libraries • Invoke native-code methods • Exploit lower-level APIs (e.g. SqlClient, bulk insert) to facilitate data movement between SQL Server and non-SQL computational resources SQLCLR out-of-process server implementation SQL Server SQL code Out-of-process server IPC SQLCLR procedure CUDA functionality SQL declare @sql nvarchar(max) set @sql = N'exec SqA1.dbo.SWGPerf @qOffset=311, @chrNum=7, @dimTile=128' exec dbo.ExecSWG @sqlCmd=@sql, @targetTable='##tmpX08' SQLCLR implementation (in C#) • Allocates IPC buffer • Initializes IPC buffer with the result set from a specified SQL query • Signals out-of-process server • Waits for response from out-of-process server • Returns data from the IPC buffer as a SQL result set Out-of-process server implementation (in C#) • Loads the CUDA implementation (dynamic link library) • Invokes a method in the DLL with a reference to the IPC buffer • Waits for completion • Signals the SQLCLR implementation • Stand-alone application or Windows service CUDA implementation (in C++) • Implements GPU functionality • Uses the IPC buffer for both incoming and outgoing data 5 A test case: genomic sequence data • Nucleic acid sequence data • Data is processed in a “loose” workflow by a sequence of software tools • Example: – Align short sequences to a long reference sequence – Identify alignment regions of interest – Map regions of interest to annotation data (genome map, known variations, etc.) 6 A test case: genomic sequence data • Typical data-management scenario: – All data is maintained in file-system files – Processing software reads and writes files – “Database-like” operations (e.g. joins) are performed using command-line tools (e.g. Perl scripts) • Can we use a DBMS as a repository? – All data is maintained in the DBMS – Processing software reads and writes tables – “Database-like” operations (e.g. joins) are performed using SQL 7 A test case: genomic sequence data • Nucleic acid sequence data • DBMS: data repository • GPU: implementation of sequence alignment algorithms – Gapped alignment: Smith Waterman – Non-gapped alignment: bit vector mapping through a hash table 8 A test case: genomic sequence data • Results: so far, so good … – Sequence data can be managed using a DBMS • Use low-level (C#, C++) implementations to manipulate string and binary data – Sequence data from the DBMS can be processed using GPU implementations • GPU sequence alignment implementations can exploit task parallelism and execute 10x (or more) faster than equivalent CPU implementations • But … – DBMS data abstractions (tables, columns) must be mapped to and from native GPU data formats and layouts – Data movement out of and into the DBMS is slow 9 Going forward… • Compromise: maintain GPU-formatted data as BLOBs in the file system – BLOBs are in database “native” format – BLOBs correspond to database tables or table partitions • Much programming and testing remain to be done… 10