Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Physical Limits of Computing Dr. Mike Frank CIS 6930, Sec. #3753X Spring 2002 Lecture #24 Adiabatic CMOS cont. Wed., Mar. 13 Administrivia & Overview • Don’t forget to keep up with homework! – We are 8 out of 14 weeks into the course. • You should have earned ~57 points by now. • Course outline: – Part I&II, Background, Fundamental Limits - done – Part III, Future of Semiconductor Technology - done – Part IV, Potential Future Computing Technologies - done – Part V, Classical Reversible Computing • Fundamentals of Adiabatic Processes & logic - last Wed. & Fri. • • • • • (----------------------- Spring Break ------------------------) Adiabatic electronics & CMOS logic families, - Mon. & TODAY Limits of adiabatics: Leakage and clock/power supplies. TODAY RevComp theory I: Emulating Irreversible Machines - Fri. 3/15 RevComp theory II: Bounds on Space-Time Overheads - Mon. 3/18 (plus ~7 more lectures…) – Part VI, Quantum Computing – Part VII, Cosmological Limits, Wrap-Up Adiabatic computing in CMOS Monday: Adiabatic switching, splitlevel retractile & pipelined logic. Today: 2-Level Adiabatic Logic, general adiabatic logic Some Timing Terminology For sequential adiabatic circuits: • Tick: Time for a single ramp transition – adiabatic speed fraction f times the RC gate delay. • Phase: Latency for a data value to propagate forward by 1 pipeline stage. • Cycle: Minimum period for all timing information to return to its initial state. • Diadic: Two retractile levels per gate Monadic: – permits inverting or non-inverting logic. • Dual rail: Two wires per logic value – permits universal logic with monodic gates only 1 level Some Figures of Demerit • Some quantities we may wish to minimize: – Ticks/phase: • proportional to logic propagation latency – Ticks/cycle: • reciprocal to rate of data throughput – Transistor-ticks/cycle: • reciprocal to HW cost-efficiency – Number of required clock/power input signals: • supplying these may be a significant component of system cost – Number of distinct voltage levels required: • may affect reliability/power tradeoff Some Interesting Questions • About pipelined, sequential, fully-adiabatic CMOS logic: – Q: Does it require an intermediate voltage level? • A: No, you can get by with only 2 different levels. – Q: What is the minimum number of externally provided timing signals you can get away with? • A: 4 (12 if split levels are used) – Q: Can the order-N different timing signals needed for long retractile cascades be internally generated within an adiabatic circuit? • A: Yes, but not statically, unless N2 hardware is used – where N is the number of stages per full sequential cycle • We now demonstrate these answers. Some Timing Examples See next slide for some detailed timing diagrams. • N-level retractile cascades: – 2N ticks/phase × 1 phase/cycle = 2N ticks/cycle • 3-phase fully-static diadic SCRL – 8 ticks/phase × 3 phases/cycle = 24 ticks/cycle • 2-phase fully-static monadic SCRL – 5 ticks/phase × 2 phases/cycle = 10 ticks/cycle • 2-phase fully-static diadic SCRL – 6 ticks/phase × 2 phases/cycle = 12 ticks/cycle • 6 tick/cycle dynamic SCRL detailed previously: – 1 tick/phase × 6 phases/cycle = 6 ticks/cycle Some SCRL timing diagrams 2LAL: 2-level Adiabatic Logic P • Dual-rail T-gate symbol: • Basic buffer element: – cross-coupled T-gates • Only 4 different timing signals, 4 ticks per cycle: P A 1 B B : A in P P out 0 – i rises during tick i, falls during tick (i+2) mod 4 • 1 tick/phase × 4 phases/cycle = 4 ticks/cycle! 0 1 2 3 Tick # 0 1 2 3 – Optimizes latency & throughput per gate. B A P 2LAL Cycle of Operation Tick number: 1 2 0 in1 in 3 11 in0 10 out1 01 in=0 01 00 11 out0 out=0 00 Input-Barrier, Clocked-Bias Latching (1) Input conditionally lowers barrier (logic w. series/parallel barriers) (2) Clock applies bias force; conditional bit flip (3) Input removed, raising barrier & locking in state-change (4) Clock bias can retract. 1 2LAL is an example of this. 1 0 0 0 Input pulse 0 1 Pulse ends N 1 Shift Register Structure • 1-tick delay per logic stage: 2 3 4 1 in out 1 2 3 4 • Logic pulse timing & propagation: 1 2 3 4 ... in in 1 2 3 4 ... More complex logic functions • Non-inverting Boolean functions: A B A A B AB AB • For inverting functions, must use quad-rail A=0 A=1 logic encoding: A0 A0 • Zero-transistor A1 A1 “inverters.” – To invert, just swap the rails! Hardware Efficiency issues • Hardware efficiency: How many logic operations per unit hardware per unit time? • Hardware spacetime complexity: How much hardware for how much time per logic op? • We’re interested in minimizing: (# of transistors) × (# of ticks) / (gate cycle) • SCRL inverter, w. return path: – (8 transistors) (6 ticks) = 48 transistor-ticks • Quad-rail 2LAL buffer stage: – (16 transistors) (4 ticks) = 64 transistor-ticks More SCRL vs. 2LAL • SCRL reversible NAND, w. all inverters: – (23 transistors) (6 ticks) = 138 T-ticks • Quad-rail 2LAL AND: – (48 transistors) (4 ticks) = 192 T-ticks • Result of comparison: Although 2LAL minimizes # of rails, and # ticks/cycle, it does not minimize overall spacetime complexity. – The question of whether 6-tick SCRL really minimizes per-op spacetime complexity among pipelined fully-adiabatic CMOS logics is still open. • An opportunity for you to make a contribution! Minimizing Power-Clock Signals • How many external clock signals required? – N-level-deep retractile cascade logic: • 2N waveforms × 1 phase = 2N signals – 6 tick/cycle, 6-phase dynamic SCRL: • 6 waveforms × 6 phases = 36 signals – 24 tick/cycle, 3-phase static SCRL: • 12 waveforms × 3 phases = 36 signals – 4 tick/cycle, 2LAL: • 1 waveform × 4 phases = 4 signals! • It turns out that 12 signals are sufficient to implement any combination of 2-level or 3level logics (including retractile) on-chip! How to Do It • Circular 2LAL shifter; pulse-gated clocks P1 0 P2 P3 P0 in out P0 P1 2 2 P2 2 P3 P0 P1 P2 P3 0 1 2 3 Tick # 0 1 2 3 12-rail system: pros & cons • Pros: – Completely solves adiabatic timing design problem – Enables mixtures of retractile, SCRL, and other logic styles on 1 chip – Enables simple fully-adiabatic SRAM & DRAM • Cons: – Timing signals are dynamic – Known fully-static alternatives use order N2 gates and signals for N-tick-long cycles – N can be large in a chip that includes deep retractile networks – Energy waste in driving the source/drain junction capacitances of all the T-gates even when timing pulse isn’t present (SOI reduces these parasitics) Fully-Adiabatic DRAM cell • 6T, 6 lines/row, 1 line/column (in/out together) • Read cycle: – – – – – Initially: lines neutral, out neutral, R off R for desired row turns on for desired row splits, driving out column R turns off, out is read merges, out is reset • Write cycle: – – – – First, do read cycle. in is set to out W turns on in changed to new value... Fully-Adiabatic SRAM • 10-T, 10 lines/row, 1 line/column • Operation similar to DRAM, except: • Read-out: T2 off; N2 retracts; T3 on; N2 asserts; T2 on, T3 off • Write: T2 off; N2 retracts; N1 retracts, copy of M presented on input; T1 on; in changes; T1 off, N1 N1 N2 asserts; N2 asserts; T2 on T1 in M T2 T3 out