Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Implementing An Associative Processor on FPGAs 1 Cell Interconnection Network A Conceptual View of the KSU ASC Model Memory PE Memory PE Memory PE Memory PE Instruction Stream Control Unit 2 An Example for the Data Memory Organization: Auto Information Stored in the PE Cells 1 Burgundy Focus OH 170 ……. PE0 2 Blue Taurus OH 160 ……. PE1 3 Blue Focus OH 190 ……. PE2 4 Red Focus PA 180 ID Color Model State ……. PE3 Rebate 3 The Prototype of the Byte-serial ASC Processor IS Control Unit Associative Processing Array CPU (for Sequential and parallel instructions) PE Array 32-bit Instruction Memory Responder Resolution Circuitry MAX/MIN Circuitry Data Memory 16 8-bit Common Registers 4 Prototype of the 4-PE Associative Processing Array PE Cell 0 Data Memory0 At_Least_One_Responder PE Cell 1 PE0 Responder Resolution Circuitry PE Cell 2 Data Memory2 Data Memory1 PE1 MAX/MIN Circuitry PE Cell 3 PE2 Data Memory3 PE3 5 A Processing Element Overview CarryOut MUX 1bit AL U MUX Common Registers Logical Registers General-Purpose Registers Comparator 16 8-bit General Purpose Registers 16-deep 1-bit Mask Stack 1-bit Responder Register 16 1-bit Logical Registers Find/Step /ResolveFirst 6 Instruction Set and Assembling Language (1) • Data Transfer Instructions - LD address, dstreg - LDRR srcreg, dstreg - ST srcreg, address - LDI immediate, dstreg - LDRRSPD srcreg • Arithmetic and Logical Instructions (mnemonic srcreg1, srcreg2, dstreg) – – – – ADD AND SLL SLT SUB OR XOR NOT SRL SLE SGT SGE SEQ SNE 7 Instruction Set and Assembling Language (2) • Mask Stack and Responder Instructions – – – – – – – – – SETMSK TOPMSK TOPMSKRSPD POPMSK POPMSKRSPD POPTHEM POPTHEMRSPD RPCMSK RPCMSKRSPD PUSHMSK PUSHMSKRSPD PUSHTHEM PUSHMSKTHEM STKTOMEM MEMTOSTK – – – FIND STEP RESFST 8 Instruction Set and Assembling Language (3) • Maximum and Minimum Searching Instructions – – – – – SETMXMI LDMXMI STMXMI MAX MIN • Branch/Jump Instructions – BNR – BRS – J 9 Associative Operations Related PE Components: • The Responder Register: to indicate whether a PE is a responder to a particular associative search or not • The Step/Find/ResolveFirst Unit: to support processing multiple responders in various ways • The Mask Stack: to represent at most 16 levels of association. The top of the Mask Stack always represents the current status of the PE – whether it is masked (‘1’) or unmasked (‘0’) 10 Example of Associative Search: Find all Focus cars located in Ohio • Perform the comparison: model == “Focus”, and store the result either ‘1’ or ‘0’ into $LR1 • Perform the comparison: location == “Ohio”, and store the result into $LR2 • AND $LR1 with $LR2, and store the result into the Responder Register (Note: all the instructions above performed by all PEs in parallel are called unmasked instructions) 11 Unmasked and Masked Instructions Unmasked Instruction: Executed by all the PEs regardless of the state of the Mask Stack Masked Instruction: Executed only by those PEs with a ‘1’ on the top of their Mask Stack 12 Example of Associative Search Using Masked Instructions: • • • (Find all Focus cars located in Ohio ) Initialize the top of the Mask Stack to ‘1’ Perform the comparison: model == “Focus”, and store the result ‘1’ or ‘0’ into $LR1 Perform the comparison: location == “Ohio”, and store the result into $LR2 AND $LR1 with $LR2, and store the result into the Responder Register AND the Responder Register with the top of the Mask Stack, and push the ANDing result into the Mask Stack and also store it into the Responder Register Increase the rebate of all Focus cars in Ohio by 10 (masked instruction) 13 The MAX/MIN Circuitry, the Responder Resolution Circuitry, and PE3 Responder Resolution PE 0 Step /Find /RslvFst Mask Stack From PE0 R0 V0 to PE0 From PE1 R1 V1 to PE1 From PE2 R2 V2 to PE2 From PE3 R3 V3 to PE3 V4 To CU clr Responder General Purpose Registers MAX/MIN From PE0 : GPR RPD D0 R0 MM0 From PE1: GPR RPD D1 R1 MM1 From PE2: GPR RPD D2 R2 MM2 From PE3: GPR RPD D3 R3 MM3 to PE0 to PE1 to PE2 to PE3 14 Using the Falkoff Algorithm for MAX/MIN Search Maximum-Value Searching (the following steps are performed in parallel for all the data) • Search bit slices of the data from the most significant bit to the least significant bit: As each bit slice is processed, each bit is ANDed with a corresponding MM bit (a 1-bit register used to indicate whether or not a data item is the maximum after processing a bit) • Check the results of the AND to ensure that at least one new maximum value remains: 15 Using the Falkoff Algorithm for MAX/MIN Search (continued) If this condition is true, then the MM bits are updated by the results of AND; if all the results are 0, then the MM bits are not updated at this time • Continue to process the remaining bit slices as above until all bits are processed • After the least significant bit slice is processed: If only one MM bit is ‘1’, it marks the largest number; if more than one MM bit is ‘1’, those data are tied for the maximum value 16 Using the Falkoff Algorithm for MAX/MIN Search (continued) Minimum-Value Searching: • Similar to maximum value searching, but complement the bit slices each time before ANDing it with MM bits 17 Search For the Maximum Rebate in the Data Memories Bit Slices (7..0) of Rebates Values in MM bits During Processing Process bit from MSB to LSB (rebate) (170) (160) (190) (180) 76543210 10101010 10100000 10111110 10110100 After processing each bit Initialize 7 6 5 4 3 2 1 0 (MM0) (MM1) (MM2) (MM3) 1 1 1 1 11100000 11100000 1 1 1 1 1 1 1 1 (max) 11110000 18 MAX/MIN Circuit using the Falkoff Algorithm OP Data 0 8-bit shift register0 “not” to RPD0 MM0 RPD0 Data 1 8-bit shift register1 “not” to RPD1 MM1 RPD1 Data 2 8-bit shift register2 “not” MM2 RDP 2 Data 3 to RPD2 8-bit shift register3 “not” MM3 RPD 3 Mask_W to RPD3 19 The MAX/MIN Circuitry, the Responder Resolution Circuitry, and PE3 Responder Resolution PE 0 Step /Find /RslvFst Mask Stack From PE0 R0 V0 to PE0 From PE1 R1 V1 to PE1 From PE2 R2 V2 to PE2 From PE3 R3 V3 to PE3 V4 To CU clr Responder General Purpose Registers MAX/MIN From PE0 : GPR RPD D0 R0 MM0 From PE1: GPR RPD D1 R1 MM1 From PE2: GPR RPD D2 R2 MM2 From PE3: GPR RPD D3 R3 MM3 to PE0 to PE1 to PE2 to PE3 20 Functionality of Responder Resolution Circuit • Responder resolution: Send an At-Least-One-Responder signal to the IS control unit • Support responder selection: Send a corresponding Responder_Before_Me signal to each PE’s Find_ Step _ResolveFirst unit 21 The Responder Resolution Circuitry for 4 PEs V0 PE0 R0 PE1 V1 PE2 Responder Resolution Circuitry ‘0’ R1 V2 V4 R2 PE3 V3 R3 R0 to R3 : from responder registers V0 to V3 : called Responder_Before_ME V4 : called At_Least_One_Responder 22 Responder Processing • Process responders in parallel: – use masked instructions • Process responders sequentially: – Need some responder selection instructions – Need a responder selection mechanism 23 Responder Selection Instructions • Step repetitively used to pick one responding PE each time for further processing – “ for” loop e.g., to step through all the Focus cars in Ohio to list the features available on each car • Find select a responding PE, while still keeping all responders identifiable – “ while” loop e.g., retrieve the tax rate from one of the cars located in OH, then increment the tax rate by a certain amount, afterwards apply this new tax rate to all the cars located in OH 24 Responder Selection Instructions (continued) • ResolveFirst select a responder and only keep this responder identifiable e.g., resolve one PE from several PEs which have the values tied for the maximum value 25 The Responder Resolution Circuitry, MAX/MIN Circuitry, and PE3 Responder Resolution PE 3 Step /Find /RslvFst Mask Stack From PE0 R0 V0 to PE0 From PE1 R1 V1 to PE1 From PE2 R2 V2 to PE2 From PE3 R3 V3 to PE3 V4 To CU clr Responder General Purpose Registers MAX/MIN From PE0 : GPR RPD D0 R0 MM0 From PE1: GPR RPD D1 R1 MM1 From PE2: GPR RPD D2 R2 MM2 From PE3: GPR RPD D3 R3 MM3 to PE0 to PE1 to PE2 to PE3 26 Design Language: VHDL • A standard hardware description language used to model and design digital hardware - Support concurrent events - can be translated into hardware by some design tools • good for managing large design structures • Supported by many CAD tool and programmable logic vendors 27 Altera MAX+PLUS II Development System Design Verification Design Entry Graphic Editor Text Editor Waveform Editor Symbol Editor Floorplan Editor Other Design Entry Tools Simulator Waveform Editor Timing Analysis Other Verification Tools MAX+PLUS II Compiler Design Compilation Programmer Data I/O Other Programmers Device Programming 28 Altera FLEX 10K FPLD IOEs IOEs IOEs IOEs IOEs IOEs L A B E A B LA B FastTrack Interconnect IOEs • FLEX10K70 Device: - 3,744 LEs - 9 EABs - 70,000 gates totally IOEs E A B L A B L A B IOEs IOEs IOEs IOEs IOEs IOEs ( IOEs – I/O elements) Partial FLEX10K20 FPLD Architecture 29 Simulation on FLEX 10K 70 Chip • The ISCU runs at about 10MHz using 50% logical gates • One EAB is used as a local memory for one PE; 4 PEs and the support circuit runs at about 14MHz using 82% logical cells. From the simulation result, we can see that the FLEX 10K 70 chip isn’t large enough for the 4-PE processor. So our current work is targeting on Altera APEX 20K devices with 1million gates in one chip. 30 Future Work • Explore more arithmetic features and associative operations • Develop the complete ASC assembly language and the ASC back-end compiler • Implement the PE cell interconnection network • Implement the whole ASC processor on bigger and faster FPGA chips • Develop the multiple instruction stream MASC model 31