Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Clockless Computing Montek Singh Thu, Sep 6, 2007 Review: Logic Gate Families A classic asynchronous pipeline by Williams 1 Review: Logic Gate Families Static CMOS logic (“standard”) Transmission gates, or “pass-transistor” logic Dynamic logic, or “domino” logic 2 Static CMOS logic: Summary Advantages: output always strongly driven pull-up and pull-down networks are fully-complementary; always exactly one of them is “on” good immunity from noise and leakage both inverting and non-inverting functions implementable each gate is inverting cascade two gates together to get non-inverting logic Disadvantages: slow/big PMOS devices needed (in addition to NMOS) greater chip area higher power consumption slower switching speed 3 Complementary CMOS Complementary CMOS logic gates – nMOS pull-down network – pMOS pull-up network inputs – a.k.a. static CMOS Pull-up OFF Pull-up ON Pull-down OFF Z (float) 1 Pull-down ON X (crowbar) OPTIONAL MATERIAL 0 Credit: David Harris, Harvey Mudd College pMOS pull-up network output nMOS pull-down network 4 Series and Parallel nMOS: 1 = ON pMOS: 0 = ON Series: both must be ON Parallel: either can be ON a a 0 g1 g2 (a) b OFF OFF ON a a a a 0 1 1 0 1 b b b ON OFF OFF OFF a a a a 0 0 1 1 0 1 1 b b b b OFF ON ON ON a a a a a g2 1 b 0 b g1 1 OFF a (c) 0 b (b) g2 1 b 0 g1 1 1 0 g2 a b a g1 a 0 0 b b 0 0 b (d) OPTIONAL MATERIAL a Credit: David Harris, Harvey Mudd College 0 1 1 0 1 1 b b b b ON ON ON OFF 5 CMOS Gate Design Activity: – Sketch a 4-input CMOS NOR gate A B C D Y OPTIONAL MATERIAL Credit: David Harris, Harvey Mudd College 6 CMOS Gate Design Activity: – Sketch a 4-input CMOS NAND gate OPTIONAL MATERIAL Credit: David Harris, Harvey Mudd College 7 Conduction Complement Complementary CMOS gates always produce 0 or 1 Ex: NAND gate – Series nMOS: Y=0 when both inputs are 1 – Thus Y=1 when either input is 0 Y – Requires parallel pMOS A B Rule of Conduction Complements – Pull-up network is complement of pull-down – Parallel -> series, series -> parallel OPTIONAL MATERIAL Credit: David Harris, Harvey Mudd College 8 Compound Gates Compound gates can do any inverting function Ex: Y A B C D (AND-AND-OR-INVERT, AOI22) A C A C B D B D (a) A (b) B C D (c) C D A B (d) C D A B A B C D Y A C B D Y (f) (e) OPTIONAL MATERIAL Credit: David Harris, Harvey Mudd College 9 Transmission (“Pass”) Gates Key Idea: transistors used in a different configuration when switched on: instead of connecting output to Vdd or Gnd, they connect output to the input Advantage: very efficient for implementing switches and multiplexers Disadvantage: signal degradation unless both NFET and PFET passgates are used in a complementary configuration 10 Pass Transistors Transistors can be used as switches g s d g s OPTIONAL MATERIAL d Credit: David Harris, Harvey Mudd College 11 Pass Transistors Transistors can be used as switches g=0 g s d s d Input g = 1 Output 0 strong 0 g=1 s d g=0 g s s g=1 Input d d g=1 s OPTIONAL MATERIAL 1 d degraded 1 g=0 0 Output degraded 0 g=0 Credit: David Harris, Harvey Mudd College strong 1 12 Transmission Gates Single pass transistors produce degraded outputs – pMOS good only for transmitting “1” – nMOS good only for transmitting “0” OPTIONAL MATERIAL Credit: David Harris, Harvey Mudd College 13 Transmission Gates Single pass transistors produce degraded outputs Complementary Transmission gates pass both 0 and 1 well Input g a b gb a b gb OPTIONAL MATERIAL g = 0, gb = 1 a b g = 1, gb = 0 0 strong 0 g = 1, gb = 0 a b g = 1, gb = 0 strong 1 1 g g a g b gb Output a b gb Credit: David Harris, Harvey Mudd College 14 Multiplexers 2:1 multiplexer chooses between two inputs S S D1 D0 Y 0 X 0 0 0 X 1 1 1 0 X 0 1 1 X 1 OPTIONAL MATERIAL D0 0 Y D1 Credit: David Harris, Harvey Mudd College 1 15 Transmission Gate Mux Nonrestoring mux uses two transmission gates – Only 4 transistors S D0 Y S D1 S OPTIONAL MATERIAL Credit: David Harris, Harvey Mudd College 16 Gate-Level Mux Design Y SD1 SD0 (too many transistors) How many transistors are needed? 20 D1 S D0 D1 S D0 OPTIONAL MATERIAL Y 4 2 4 2 4 2 Y 2 Credit: David Harris, Harvey Mudd College 17 Dynamic Logic, or “domino” Key idea: only use NMOS’s to compute function use a single PMOS to reset Advantages: significantly fewer transistors smaller chip area higher speed, lower power less “loading” on wires (drive fewer transistors) for async: no storage elements needed Disadvantages: need extra control input to precharge logic is typically non-inverting only more vulnerable to noise and leakage effects 18 Dynamic Logic, or “domino” (contd.) Gate has 2 phases: precharge (=reset): output reset to ‘0’ evaluate: output computed either stays ‘0’, or switches to ‘1’ control input PC data inputs pull-up network pull-down network controls “precharge” data output controls “evaluation” PC =0 (asserted) precharge PC =1 (de-asserted) evaluate Pull-up and pull-down must never both be simultaneously active: ensure that data inputs are reset while gate is precharging or, add a “footer” device 19 Outline: Several Pipeline Styles Classic static logic pipeline: Sutherland Recent static logic pipeline: MOUSETRAP Classic dynamic logic pipeline: Williams/Horowitz’ PS0 20 A Classic Asynchronous Dynamic Pipeline Williams and Horowitz’s PS0 pipeline: Structure Operation Performance 21 A Classic Approach: PS0 Pipeline Williams/Horowitz (Stanford U.) [1986-91]: successfully used in fabricated chips [Stanford ’87] [HAL ’90s] Stage 2 Stage 1 Stage 3 ack Data in data Processing Block Data out Completion Detector Implemented using “dynamic logic” 22 PS0 Pipeline Stage A PS0 stage consists of dynamic gates and a completion detector: ack PC data inputs Completion Detector “keeper” Pull-down network Processing Block data outputs 23 Dual-Rail Completion Detector Combines dual-rail signals Indicates when all bits are valid (or reset) C-element: if all inputs=1, output 1 if all inputs=0, output 0 bit0 OR bit1 OR bitn OR else, maintain output value C Done OR together 2 rails per bit Merge results using “C-element” 24 PS0 Protocol PRECHARGE N: when N+1 completes evaluation delete data: after next stage has copied it EVALUATE N: when N+1 completes precharging accept new data: after next stage is emptied indicates “done” 6 3 N+1 5 N 1 evaluates indicates “done” 2 precharges evaluates 4 N+2 3 evaluates Complete cycle: 6 events3 Evaluate events3 events Precharge Precharge: Evaluate: another 25 PS0 Performance 6 4 5 2 1 Cycle Time = 3 3 TEVAL TPRECH 2 TDETECT TE VA L Evaluation Time TP RE CH Precharge Time TDE TECT Completion Detection Time 26 Summary: PS0 Pipelining Datapaths are latch-free: dynamic gates themselves provide implicit latches +: chip area savings +: extremely low latency Data items kept separate by control stage deletes data: only after next stage has copied it stage accepts new data: only if next stage is empty distinct data items always separated by “spacers” Control is extremely simple: each controller = single wire completion detector directly controls previous stage +: chip area savings +: low control overhead 27 Comparison to a Clocked Pipeline How would you design the pipeline if you actually had a clock? 1. Replace handshaking with “magic clocking” each stage gets its own clock successive clocks are slightly skewed essentially, clocked simulation of asynchronous handshaking! – need multiple clock phases! Ck Ck’ latch 2. Use a single clock, but insert latches between stages latches are simple, level-sensitive consecutive stages receive complementary clock signals 28 Drawbacks of PS0 Pipelining 1. Poor throughput: long cycle time: 6 events per cycle data “tokens” are forced far apart in time 2. Limited storage capacity: max only 50% of stages can hold distinct tokens data tokens must be separated by at least one spacer My Research Goals have been: address both issues still maintain very low latency 29 Homework #4 (due Tue Sep 18) 1. Enumerate ALL of the timing assumptions inherent in Williams’ PS0 style Assume all gate and wire delays can be arbitrary For which scenarios can there be a malfunction? 2. Compare the cycle times of PS0 with an ideal clocked dynamic pipeline (slide #28) 30