Download Efficient Associative SIMD Processing for Non

Efficient Representation of Data Structures on Associative Processors Jalpesh K. Chitalia (Advisor Dr. Robert A. Walker) Computer Science Department Kent State University Presentation Outline        ASC Processor Architecture Associative Features Structure Codes Represent Data Structures Structure Code Operations Implementation of Structure Codes Summary and Future Work The ASC Processor  A scalable design implemented on a million gate Altera FPGA  SIMD-like architecture  Currently, 36 8-bit Processing Elements (PE) available  8-bit Instruction Stream (IS) control unit with 8-bit Instruction and Data addresses, 32-bit instructions The ASC Architecture PE and Me mory Network Data Bus Responder Resoluti on Uni t PE and Me mory Instruction Bus memory and supporting circuitry Con trol Uni t From Control Unit PE and Me mory Common Registers PE and Me mory PE Array The ASC Architecture  Each PE listens to the IS through the broadcast and reduction network  PEs can communicate amongst themselves using the PE Network  PE may either execute or ignore the microcode instruction broadcast by IS under the control of the Mask Stack The ASC Features  Associative Search  Each PE can search its local memory for a key under the control of IS  Responder Resolution  A special circuit signals if ‘at least one’ record was found  Masked Operation  Local Mask Stacks can turn on or off the execution of instruction from IS The ASC Example Select * from Students where Grade > 90 The ASC Features  Constant Time Associative Operations  Associative Search  Finding minimum or maximum in a field  Ideal for Database processing  Data is organized in tabular format  Each tuple in a table can be processed by one PE  PE Network  Many parallel algorithms require all PEs to move contents in a regular pattern  E.g.: Image Convolution, Matrix Multiplication Non Tabular Data Structures  Many applications use linked list based data structures  For example, plain HTML parsing can be done using tree-structure  Similarly, XML or Object-relational databases can be represented using trees only  Game programming uses tree-based algorithms, and demand much of processing power  Complier construction uses directed acyclic graphs and tree structures Data Structure Codes  A unique coding scheme  Allows representation of any data structure into a tabular format  Tabular format allows division of data amongst the PEs  Also known as “structure code”  Different coding schemes for different data structures may be required  Uses Associative Search feature of the ASC Processor Simple List-based Structures  The left figure shows a possible representation of 1D and 2D arrays using structure codes  The right figure represents stack and queue, depending on the use of appropriate functions Complex List-based Structures  Each digit-position indicates the level of a tree  Each value in that position indicates the position of that child from the left  Discussions henceforth are confined to trees  Graphs are read in a slightly different manner Structure Code Operations  Two sets of constant-time operations: scalar and parallel  Scalar Instructions are simple mathematic operations  Search parent, child or root  Finds value of structure code for next or previous nodes  Parallel Instructions use complex associative operations  Finds code for next, previous or both siblings  ‘Locates’ the required node for further processing Scalar Operations fstcd: leftmost child nxtcd: right sibling prvcd: left sibling trncd: truncate (parent) node  trnacd: truncate all (root) node      Can be used to allocate a new node  Limited use in searching records Parallel Operations  Index instructions:  Flags a node of the result to ease further processing  nxtdex (next or right), prvdex (previous or left) and sibdex (siblings or both left and right)  Value instructions:  Returns the exact structure code of the result  nxtval (next or right) and prvval (previous or left)  Can be used to ‘locate’ nodes in any tree  Uses parallel and associative hardware resources Applications of DSC Ops  Scalar  Associative searching: the parent or the root in a given data structure space  Allocating elements in a data structure space  Value  Finding the value of resultant structure code  Index  Most useful in associative search (eg, tree traversal) Implementation Requirements  Hardware functionality  Debugging and developing I/O functionality  Debugging few other instructions  Structure Code Operations  Scalar Operations: nxtcd, prvcd  Parallel Operations: nxtdex, nxtval, prvdex, prvval, sibdex  Parallel Operations  Most complex set amongst ASC operations Implementation – Value Ops  Processing of Parallel Value Ops  Associative search, associative min/max  Input Cycle  All the elements of data structure are distributed amongst the PEs sequentially  Reference node is broadcast to them  Output Cycle  Multi-byte structure code is stored in destination address (scalar memory) Implementation – Index Ops  Processing of Parallel Index Ops  The resultant element(s) is (are) ‘marked’ for further processing  Note: sibdex may select two results  Input and Output  Input: same as in value operations  Output: Bit flags for all the input nodes are stored sequentially in destination address (scalar memory) Summary  SIMD-based computers are more suited for database processing  ASC processor with its associative operations makes them more efficient  Structure codes translate non-tabular data structures into a tabular format  Tree and Graphs can be represented and evenly divided like records in a table  Object-relational databases, HTML processing  Stacks, Queues, multi-dimensional arrays can be efficiently processed  Structure codes not required for even representation Future Work  Efforts are in place to clean the architecture  To allow multiplier/divider on each PE  To accommodate RISC instruction set  Unpacked bytes are used to support variable-length structure code  Can be avoided with an efficient divider unit  Input/Output  Special structure code operations are defined but not implemented in this work  Parallel input/output is also required  Developing applications that use structure codes Acknowledgements     Professor Walker Professor Potter Committee members for their time Kevin Schaffer, Hong Wang, Meiduo Wu, Lei Xie, Ping Xu Questions? Thank You!

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Efficient Associative SIMD Processing for Non