Download Efficient Associative SIMD Processing for Non

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Efficient Representation of Data
Structures on Associative Processors
Jalpesh K. Chitalia
(Advisor Dr. Robert A. Walker)
Computer Science Department
Kent State University
Presentation Outline







ASC Processor Architecture
Associative Features
Structure Codes
Represent Data Structures
Structure Code Operations
Implementation of Structure Codes
Summary and Future Work
The ASC Processor
 A scalable design implemented on a
million gate Altera FPGA
 SIMD-like architecture
 Currently, 36 8-bit Processing
Elements (PE) available
 8-bit Instruction Stream (IS) control
unit with 8-bit Instruction and Data
addresses, 32-bit instructions
The ASC Architecture
PE and Me mory
Network
Data
Bus
Responder
Resoluti on
Uni t
PE and Me mory
Instruction
Bus
memory
and supporting circuitry
Con trol
Uni t
From Control Unit
PE and Me mory
Common
Registers
PE and Me mory
PE Array
The ASC Architecture
 Each PE listens to the IS through the
broadcast and reduction network
 PEs can communicate amongst
themselves using the PE Network
 PE may either execute or ignore the
microcode instruction broadcast by IS
under the control of the Mask Stack
The ASC Features
 Associative Search
 Each PE can search its local memory for
a key under the control of IS
 Responder Resolution
 A special circuit signals if ‘at least one’
record was found
 Masked Operation
 Local Mask Stacks can turn on or off the
execution of instruction from IS
The ASC Example
Select * from Students where Grade > 90
The ASC Features
 Constant Time Associative Operations
 Associative Search
 Finding minimum or maximum in a field
 Ideal for Database processing
 Data is organized in tabular format
 Each tuple in a table can be processed by one PE
 PE Network
 Many parallel algorithms require all PEs to move
contents in a regular pattern
 E.g.: Image Convolution, Matrix Multiplication
Non Tabular Data Structures
 Many applications use linked list based data
structures
 For example, plain HTML parsing can be done
using tree-structure
 Similarly, XML or Object-relational databases
can be represented using trees only
 Game programming uses tree-based algorithms,
and demand much of processing power
 Complier construction uses directed acyclic
graphs and tree structures
Data Structure Codes
 A unique coding scheme
 Allows representation of any data structure into
a tabular format
 Tabular format allows division of data amongst
the PEs
 Also known as “structure code”
 Different coding schemes for different data
structures may be required
 Uses Associative Search feature of the ASC
Processor
Simple List-based Structures
 The left figure shows a possible representation of
1D and 2D arrays using structure codes
 The right figure represents stack and queue,
depending on the use of appropriate functions
Complex List-based Structures
 Each digit-position
indicates the level
of a tree
 Each value in that
position indicates
the position of that
child from the left
 Discussions
henceforth are
confined to trees
 Graphs are read in
a slightly different
manner
Structure Code Operations
 Two sets of constant-time operations:
scalar and parallel
 Scalar Instructions are simple mathematic
operations
 Search parent, child or root
 Finds value of structure code for next or
previous nodes
 Parallel Instructions use complex
associative operations
 Finds code for next, previous or both siblings
 ‘Locates’ the required node for further
processing
Scalar Operations
fstcd: leftmost child
nxtcd: right sibling
prvcd: left sibling
trncd: truncate
(parent) node
 trnacd: truncate all
(root) node




 Can be used to
allocate a new node
 Limited use in
searching records
Parallel Operations
 Index instructions:
 Flags a node of the result to ease further
processing
 nxtdex (next or right), prvdex (previous or left)
and sibdex (siblings or both left and right)
 Value instructions:
 Returns the exact structure code of the result
 nxtval (next or right) and prvval (previous or
left)
 Can be used to ‘locate’ nodes in any tree
 Uses parallel and associative hardware
resources
Applications of DSC Ops
 Scalar
 Associative searching: the parent or the
root in a given data structure space
 Allocating elements in a data structure
space
 Value
 Finding the value of resultant structure
code
 Index
 Most useful in associative search (eg, tree
traversal)
Implementation Requirements
 Hardware functionality
 Debugging and developing I/O functionality
 Debugging few other instructions
 Structure Code Operations
 Scalar Operations: nxtcd, prvcd
 Parallel Operations: nxtdex, nxtval, prvdex,
prvval, sibdex
 Parallel Operations
 Most complex set amongst ASC operations
Implementation – Value Ops
 Processing of Parallel Value Ops
 Associative search, associative min/max
 Input Cycle
 All the elements of data structure are
distributed amongst the PEs sequentially
 Reference node is broadcast to them
 Output Cycle
 Multi-byte structure code is stored in
destination address (scalar memory)
Implementation – Index Ops
 Processing of Parallel Index Ops
 The resultant element(s) is (are)
‘marked’ for further processing
 Note: sibdex may select two results
 Input and Output
 Input: same as in value operations
 Output: Bit flags for all the input nodes
are stored sequentially in destination
address (scalar memory)
Summary
 SIMD-based computers are more suited for
database processing
 ASC processor with its associative
operations makes them more efficient
 Structure codes translate non-tabular data
structures into a tabular format
 Tree and Graphs can be represented and evenly
divided like records in a table
 Object-relational databases, HTML processing
 Stacks, Queues, multi-dimensional arrays can be
efficiently processed
 Structure codes not required for even
representation
Future Work
 Efforts are in place to clean the architecture
 To allow multiplier/divider on each PE
 To accommodate RISC instruction set
 Unpacked bytes are used to support
variable-length structure code
 Can be avoided with an efficient divider unit
 Input/Output
 Special structure code operations are defined
but not implemented in this work
 Parallel input/output is also required
 Developing applications that use structure
codes
Acknowledgements




Professor Walker
Professor Potter
Committee members for their time
Kevin Schaffer, Hong Wang, Meiduo
Wu, Lei Xie, Ping Xu
Questions?
Thank
You!