Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Modeling CPU’s using Different MOC’s: a Case Study Trevor C. Meyerowitz Advisor: Alberto Sangiovanni-Vincentelli 290n Final Presentation May 15 2002 Outline Introduction Motivation The Simple CPU to be modeled The Domains Investigated Modeling a Non-Pipelined Processor Modeling a Pipelined Processor Demo Conclusions 2 Motivation Processor Designs are becoming much larger and more complicated Many instructions in flight at a single time Strange Orderings, Speculation This can be very hard to verify We are developing a methodology to help alleviate these problems. Using Different Models of Computation can Potentially Simplify the Design Task PtolemyII Allows us to Compare a Variety of these MOC’s in a Unified Framework 3 The Simple CPU Processor Statistics Small Instruction Set ADD, SUB, ADDI, SUBI, and BNE Only Integer Operations 128 registers, 128 entry instruction memory This is enough to be interesting Data dependencies Control flow 4 The Domains Investigated Process Networks Untimed Model Kahn-Macqueen Semantics Infinite Queue’s Blocking Reads Fully Deterministic Schedule Independent Synchronous Reactive Untimed Model Instantaneous Communication and Computation Iterates Until a Fixed Point is Found Signals must be monotonic 5 The Nonpipelined Processor Code and netlist reusued for both domains (I.e. these are domain polymorphic actors) Represented in PtolemyII as: Fetch, Regfile, Execute and a Delay. Fetch only after previous instruction has completed 6 Non-Pipelined Processor Pseudocode (Fetch + Regfile) public class Fetch … { public class Reg … { … … public fire() { if (read_mode) { inst = input_get_op_codes(); <rs_v, rt_v> = read_regs(); output_regs.send(0, inst); output_regs.send(rs_v, rt_v); } else { rd_v = input_get_write_vals(); write_values(); } read_mode = !read_mode; } … public fire() { pc = input_pc.get(0); <rs, rt, rd, val, inst> = readIMEM(pc); output_inst.send(0, inst); output_regs.send(rs, rt); … } … } } 7 Non-Pipelined Processor Pseudocode (Execute) public class Exec … { … public fire() { if (write_mode=false) { reg_vals = input_reg_vals(); inst_type = read_inst(); results = exec_inst(inst_type, reg_vals); } else { write_values(rd, results); write_next_pc(results); } write_mode = !write_mode; } … } 8 Non-Pipelined Processor: Differences between Domains SR required that we put the register read and register write in different iterations as well as split of execution and writing its results Process networks cannot query port status SR requires use of prefire and postfire conditions We shared code between the two domains, SR probably has more flexibility. 9 Pipelined Processor Only required recoding of fetch behavior Fetch every “iteration” Only stall after branches (no branch prediction) No forwarding logic is required!? This is because two register reads can’t occur without a register write happening between them Due to PN deterministic requirement Also true because of SR because of states Probably could structure SR to require forwarding logic (lower level of abstraction!!) 10 Pipelined Processor – Fetch pseudo-code public class Fetch … { … public fire() { if (initial_firing || prev_inst_is_branch) { pc = input_pc.get(0); } <rs, rt, rd, val, inst> = readIMEM(pc); output_inst.send(0, inst); output_regs.send(rs, rt); … pc = pc+1; } … } Causes you to stall until the branch is finished. Immediately fires again if there is no branch! 11 Pipelining and Forwarding (t=0) Program: Inst. id Inst_1 Inst_2 Assembly code Logical meaning ADD R1, R2, R3 R1 = R2 + R3 ADD R3, R1, R1 R3 = R1 + R1 Fetch Reg Exec File Inst_2: R3 = R1(?) + R1(?) Inst_1: R1 = R2(4) + R3(5) Register File State: R1 = 2 R2 = 4 R3 = 5 12 Pipelining and Forwarding (t=1) Program: Inst. id Inst_1 Inst_2 Assembly code Logical meaning ADD R1, R2, R3 R1 = R2 + R3 ADD R3, R1, R1 R3 = R1 + R1 Fetch Reg Exec File Register File State:The PN and SR models don’t have this problem R1 = 2 because they enforce R2 = 4 the order: read inst_1, R3 = 5 write inst_1, read inst_2 Inst_2: R3 = R1(2) + R1(2) Inst_1: R1(9) = R2(4) + R3(5) This is an error!! It should read R1 as 9. We can solve this by adding forwarding logic, or stalling the pipeline 13 Pipelined Processor with Branch Prediction Still in order, but branches are predicted instead of stalling. Requires recoding of Fetch and the Register File Fetch Performs branch prediction Handles mispredicts Register File Keeps a queue of instructions Stall on dependencies Only write resolved instructions to regfile This represents one refinement path Biased towards Process Networks 14 Program Code: Inst RD, RS, RT (Val) ADD 5 5 5 ADD 6 5 5 BNE 5 20 -3 ADD 7 6 6 ADD 8 7 7 ADD 9 8 8 ADD 10 9 9 SUB 11 10 50 15 Outline Introduction Modeling a Non-Pipelined Processor Modeling a Pipelined Processor Demo Conclusions Other Architectural Features Observations Future Work 16 Other Architectural Features Out of Order Execution Requires “breaking” of PN model Superscalar execution Multiple fetches at once.. Might be problematic to do in PN. Memory systems Initially simple, more complicated when refinements are added. 17 Observations Process Networks are relatively easy to use and are quite predictable. Process Networks are great for initial abstract models. Synchronous Reactive is simpler than DE to work with, but more complicated to design than PN’s. PN doesn’t deal well with ordering refinements, but SR can handle them better. We envision a methodology where you start with a PN model and then move to an SR model. 18 Future Work Look at implementing other architectural features Examine relaxing PN’s requirements Look at domain specific actors Examine composing different MOC’s Introduce timing 19 20