Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Clockless Logic Montek Singh Thu, Mar 2, 2006 1 Review: Logic Gate Families Static CMOS logic Dynamic logic, or “domino” logic Transmission gates, or “pass-transistor” logic 2 Static CMOS logic Advantages: output always strongly driven pull-up and pull-down networks are fully-complementary; exactly one of them is “on” always good immunity from noise and leakage both inverting and non-inverting functions implementable each gate is inverting cascade two gates together to get non-inverting logic Disadvantages: slow/big PMOS devices needed (in addition to NMOS) greater chip area higher power consumption slower switching speed 3 Dynamic Logic, or “domino” Key idea: only use NMOS’s to compute function use a single PMOS to reset Advantages: significantly fewer transistors smaller chip area higher speed, lower power less “loading” on wires (drive fewer transistors) for async: no storage elements needed Disadvantages: need extra control input to precharge logic is typically non-inverting only more vulnerable to noise and leakage effects 4 Dynamic Logic, or “domino” (contd.) Gate has 2 phases: precharge (=reset): output reset to ‘0’ evaluate: output computed either stays ‘0’, or switches to ‘1’ control input PC data inputs pull-up network pull-down network controls “precharge” data output controls “evaluation” PC =0 (asserted) precharge PC =1 (de-asserted) evaluate Pull-up and pull-down must never both be simultaneously active: ensure that data inputs are reset while gate is precharging or, add a “footer” device 5 Transmission Gates Key Idea: transistors used in a different configuration when switched on: instead of connecting output to Vdd or Gnd, they connect output to the input Advantage: very efficient for implementing switches and multiplexers Disadvantage: signal degradation unless both NFET and PFET passgates are used in a complementary configuration 6 Outline: Several Pipeline Styles Classic static logic pipeline: Sutherland Recent static logic pipeline: MOUSETRAP Classic dynamic logic pipeline: Williams/Horowitz’ PS0 7 A Classic Asynchronous Dynamic Pipeline Williams and Horowitz’s PS0 pipeline: Structure Operation Performance 8 A Classic Approach: PS0 Pipeline Williams/Horowitz (Stanford U.) [1986-91]: successfully used in fabricated chips [Stanford ’87] [HAL ’90s] Stage 2 Stage 1 Stage 3 ack Data in data Processing Block Data out Completion Detector Implemented using “dynamic logic” 9 PS0 Pipeline Stage A PS0 stage consists of dynamic gates and a completion detector: ack PC data inputs Completion Detector “keeper” Pull-down network Processing Block data outputs 10 Dual-Rail Completion Detector Combines dual-rail signals Indicates when all bits are valid (or reset) C-element: if all inputs=1, output 1 if all inputs=0, output 0 bit0 OR bit1 OR bitn OR else, maintain output value C Done OR together 2 rails per bit Merge results using “C-element” 11 PS0 Protocol PRECHARGE N: when N+1 completes evaluation delete data: after next stage has copied it EVALUATE N: when N+1 completes precharging accept new data: after next stage is emptied indicates “done” 6 3 N+1 5 N 1 evaluates indicates “done” 2 precharges evaluates 4 N+2 3 evaluates Complete cycle: 6 events3 Evaluate events3 events Precharge Precharge: Evaluate: another 12 PS0 Performance 6 4 5 2 1 Cycle Time = 3 3 TEVAL TPRECH 2 TDETECT TE VA L Evaluation Time TP RE CH Precharge Time TDE TECT Completion Detection Time 13 Summary: PSO Pipelining Datapaths are latch-free: dynamic gates themselves provide implicit latches +: chip area savings +: extremely low latency Data items kept separate by control stage deletes data: only after next stage has copied it stage accepts new data: only if next stage is empty distinct data items always separated by “spacers” Control is extremely simple: each controller = single wire completion detector directly controls previous stage +: chip area savings +: low control overhead 14 Comparison to a Clocked Pipeline How would you design the pipeline if you actually had a clock? 1. Replace handshaking with “magic clocking” each stage gets its own clock successive clocks are slightly skewed essentially, clocked simulation of asynchronous handshaking! – need multiple clock phases! Ck Ck’ latch 2. Use a single clock, but insert latches between stages latches are simple, level-sensitive consecutive stages receive complementary clock signals 15 Comparison … (contd.) Cycle Times? 16 Drawbacks of PSO Pipelining 1. Poor throughput: long cycle time: 6 events per cycle data “tokens” are forced far apart in time 2. Limited storage capacity: max only 50% of stages can hold distinct tokens data tokens must be separated by at least one spacer Our Research Goals: address both issues still maintain very low latency 17