Download Principles of Computer Architecture Dr. Mike Frank

Principles of Adiabatic Processes Adiabatic Processes - overview • Adiabatic steps in the reversible Carnot cycle • Evolution of the meaning of “adiabatic” • Time-proportional reversibility (TPR) of quasiadiabatic processes • Adiabatic theorem of quantum mechanics • Adiabatic transitions of a two-state system • Logic & memory in irreversible and adiabatic processes. The Carnot Cycle • In 1822-24, Sadi Carnot analyzed the efficiency of an ideal heat engine all of whose steps were reversible, and furthermore proved that: – Any reversible engine (regardless of details) would have the same efficiency (THTL)/TH. – No engine could have greater efficiency than a reversible engine w/o producing work from nothing – Temperature itself could be defined on a thermodynamic scale based on heat recoverable by a reversible engine operating between TH and TL Steps of Carnot Cycle P • Isothermal expansion at TH • Adiabatic (without flow of heat) expansion THTL • Isothermal compression at TL • Adiabatic compression TLTH Reservoir Isothermal Adiabatic Reservoir TH TL V Isothermal Reservoir Reservoir Adiabatic Carnot Cycle Terminology • Adiabatic (Latin): literally “Without flow of heat” – I.e., no entropy enters or leaves the system • Isothermal: “At the same temperature” – Temperature of system remains constant as entropy enters or leaves. • Both kinds of steps, in the case of the Carnot cycle, are examples of isentropic processes – “at the same entropy” – I.e., no (known) information is transformed into entropy in either process • But, the usage of the word “adiabatic” in applied physics has mutated to essentially mean isentropic. Old and New “Adiabatic” • Consider a closed system where you just lose track of its detailed evolution: – It’s adiabatic (no net heat flow), – But it’s not “adiabatic” (not isentropic) • Consider a box containing some heat, flying ballistically out of the system: “The System” – It’s not adiabatic, (no heat flow) • because heat is “flowing” out of the system – But it’s “adiabatic” (no entropy is generated) Box o’ Heat Justifying the Modern Usage • In an adiabatic process following a desired trajectory through configuration space, – No heat flows in or out of the subsystem consisting of those particular degrees of freedom whose variation carries out the motion along the desired trajectory. • E.g., the computational degrees of freedom in a computational process. – No heat flow  no entropy flow • Heat is just energy whose configuration info. is entropy – No entropy flow  no sustained entropy generation • Since bounded systems have a maximum entropy Quasi-Adiabatic Processes • Complete adiabaticity means absolutely zero rate of entropy generation – Requires infinite degree of isolation of system from uncontrolled external environment! •  Impossible to completely achieve in practice. • Real processes are only adiabatic to the extent that their entropy generation approaches zero. – Term “quasi-adiabatic” emphasizes imperfection • Asymptotically adiabatic designs conceptually approach 0 in the limit of variation of specified technology design parameter(s) – E.g., low device frequency, large device size Quasi-Adiabatic Processes • No real process is completely adiabatic – Because some outside system may always have enough energy to interact with & disturb your system’s evolution - e.g., cosmic ray, asteroid •  Evolution of system state is never perfectly known – Unless you know the exact quantum state of the whole universe – Entropy of your system always increases. • Unless it is already at a maximum (at equilibrium) – Can’t really be at complete equilibrium with its surroundings » unless whole universe is at utterly stable “heat death” state. • Systems at equilibrium are sometimes called “static.” • Non-equilibrium, quasi-adiabatic processes are sometimes also called quasi-static – Changing, but near a local equilibrium otherwise Quantifying Adiabaticity • An appropriate metric for quantifying the degree of adiabaticity of any process is just to use the quality factor Q of that process. – Q isn’t just for oscillatory processes any more • Q is generally the ratio Etrans / Ediss between the: – Energy Etrans involved in carrying out a process • transitioning between states along a trajectory – Amount Ediss of energy dissipated during the process. • Normally also matches the following ratios: – Physical information content / entropy generated – Quantum computation rate / decoherence rate – Decoherence time / quantum-transition time Degree of Reversibility • The degree of reversibility (a.k.a. reversibility, a.k.a. thermodynamic efficiency) of any quasiadiabatic process is defined as the ratio of: – the total free energy at the start of the process –  by the total energy spent in the process • Or, equivalently: – the known, accessible information at the start –  by the amount that is converted to entropy • This same quantity is referred to as the (percycle) “quality factor” Q for any resonant element (e.g., LC oscillator) in EE. E free (0) Espent (t ) K0 S t The “Adiabatic Principle” • Claim: Any ideal quasi-adiabatic process performed over time t has a thermodynamic efficiency that is proportional to t, – in the limit as t0. • We call processes that realize this idealization time-proportionally reversible (TPR) processes. • Note that the total energy spent (Espent), and the total entropy generated (S), are both inversely proportional to t in any TPR process. – The slower the process, the more energy-efficient. Proving the Adiabatic Principle (See RevComp memo #M14) • Assume free energy is in generalized kinetic energy of motion Ek of system through its configuration space. Ek = ½mv2  v2 = (/t)2  t2 for m,  const. • Assume that every tf time, on average (mean free time), a constant fraction f of Ek is thermalized (turned into heat) • Whole process thermalizes energy f(t/tf)Ek  tt2 = t1. Constant in front is ½ fm2/tf : 2, where =½fm/tf is the effective viscosity. Example: Electrical Resistance • We know Pspent=I2R=(Q/t)2R, or Espent =Pt = Q2R/t.  Note scaling with 1/t – Charge transfer through a resistor obeys the adiabatic principle! • Why is this so, microscopically? – In most situations, conduction electrons have a large Fermi velocity or thermal velocity relative to drift velocity. •  Scatter off of lattice-atom cross-sections with a mean free time tf that is fairly independent of drift velocity – Each scattering event thermalizes the electron’s drift kinetic energy - a frac. f of current’s total Ek • Therefore assumptions in prev. proof apply! The Adiabatic Theorem • A result in basic quantum theory – Proved in many quantum mechanics textbooks • Paraphrased: A system initially in its ground state (or more generally, its nth energy eigenstate) will, after subjecting it to a sufficiently slow change of applied forces, remain in the corresponding state, with high probability. – Result has been recently shown to be very general. • Amount of leakage out of desired state is proportional to speed of transition, at low speeds. – Quantum systems obey the adiabatic principle! Some Loss-Inducing Interactions For ordinary voltage-coded electronics: • Interactions whose dissipation scales with speed: – Parasitic EM emission from dynamic (C,L) reactances – Scattering of ballistic electrons from lattice imperfections, causing Ohmic resistance • Interactions having different scaling laws: – Interference from outside EM sources – Thermally-activated leakage of electrons over potential energy barriers – Quantum tunneling of electrons through narrow barriers (sub-Fermi wavelength) Focus of most of the work on – Losses due to intentional treatment of known adiabatics to physical information as entropy (bit erasure) date Some Ways to Reduce Losses • EM interference / emission: Add shielding, use high-Q MEMS/NEMS oscillators • Scattering/resistance: Ballistic FETs, superconductors • Thermal leakage: avoid low VT and/or high temps • Tunneling: thick tunnel barriers, high-κ dielectrics, conductors w. low Fermi-level/high electron affinity, vacuum-gap barriers? • Intentional bit erasure: reduce voltages, use mostlyreversible adiabatic logic designs Adiabatic Circuits & Reversible Computing Myths, Controversies & Misconceptions Some Claims Against Reversible Computing Eventual Resolution of Claim John von Neumann, 1949 – Offhandedly remarks during a lecture that computing requires kT ln 2 dissipation per “elementary act of decision” (bit-operation). No proof provided. Twelve years later, Rolf Landauer of IBM tries valiantly to prove it, but succeeds only for logically irreversible operations. Rolf Landauer, 1961 – Proposes that the logically irreversible operations that can be seen to necessarily cause dissipation are irreducible. Landauer’s argument for irreducibility of logically irreversible operations was conclusively refuted by Bennett’s 1973 paper (partially presaged by Lecerf). Bennett’s 1973 construction is criticized for using too much memory. Bennett devises a more space-efficient version of the algorithm in 1989. Bennett’s models criticized by various parties for depending on random Brownian motion, and not making steady forward progress. Fredkin and Toffoli at MIT, 1980, provide ballistic “billiard ball” model of reversible computing that makes steady progress. Various parties including Zurek note that Fredkin’s original classical-mechanical billiard-ball model is chaotically unstable. Zurek, 1984, shows that quantum models can avoid the chaotic instabilities. (Though there are workable classical ways to fix the problem also.) Various parties propose that classical reversible logic principles won’t work at the nanoscale, for unspecified or vaguely-stated reasons. Drexler, 1980’s, designs various mechanical nanoscale reversible logics and carefully analyzes their energy dissipation. Carver Mead, CalTech, 1980 – Attempts to show that the kT bound is unavoidable in electronic devices, via a collection of counter-examples. No general proof provided. Later he asked Feynman about the issue; in 1985 Feynman provided a quantum-mechanical model of reversible computing. Various parties point out that Feynman’s model of reversible computing only supports serial computation. Margolus at MIT, 1990, demonstrates a parallel quantum model of reversible computing—but only with 1 dimension of parallelism. People question whether the various theoretical models can be validated with a working electronic implementation. Seitz and colleagues at CalTech, 1985, demonstrate working energy recovery circuits using adiabatic switching principles. Seitz, 1985—Has some working circuits, but unsure if arbitrary logic is possible. Koller & Athas, Hall, and Merkle (1992) separately devise general reversible combinational logics. Koller & Athas, 1992 – Conjecture reversible sequential feedback logic impossible. Younis & Knight @MIT do reversible sequential, pipelineable circuits in 1993-94. Some computer architects (including anonymous ISCA reviewers) wonder whether the constraint of reversible logic leads to unreasonable design convolutions. Vieri, Frank and coworkers at MIT, 1995-99, refute these qualms by demonstrating straightforward designs for fully-reversible and scalable gate arrays, microprocessors, and instruction sets. Some computer science theorists suggest that the algorithmic overheads of reversible computing might outweigh their practical benefits. Frank, 1997-2003, publishes a variety of rigorous theoretical analysis refuting these claims for the most general classes of applications. Various parties point out that high-quality power supplies for adiabatic circuits seem difficult to build electronically. Frank, 2000, suggests microscale/nanoscale electromechanical resonators for highquality energy recovery with desired waveform shape and frequency. Frank, 2002—Briefly wonders if synchronization of parallel reversible computation in 3 dimensions (not covered by Margolus) might not be possible. Later that year, Frank devises a simple mechanical model showing that parallel reversible systems can indeed be synchronized locally in 3 dimensions. Adiabatic Circuits and Reversible Computing Commonly Encountered Myths, Fallacies, and Pitfalls (in the Hennessy-Patterson tradition) Myths about Adiabatic Circuits & Reversible Computing • “Someone proved that computing with <<kT free-energy loss per bitoperation is impossible.” • “Physics isn’t reversible.” • “An energy-efficient adiabatic clock/power supply is impossible to build.” • “True adiabaticity doesn’t require reversible logic.” • “Sequential logic can’t be done adiabatically.” • “Adiabatic circuits require many clock/power rails and/or voltage levels.” • “Adiabatic design is necessarily difficult.” Fallacies about Adiabatic Circuits and Reversible Computing • “Since speed scales with energy dissipation in adiabatic circuits, they aren’t good for highperformance computing.” • “If I tried and failed to invent an efficient adiabatic logic, it must be impossible.” • “The algorithmic overheads of reversible computing mean it can never be cost-effective.” • “Since leakage gets worse in nanoscale devices, adiabatics is doomed.” Pitfalls in Adiabatic Circuits and Reversible Computing • Assuming that the best • Using diodes in the reversible and charge-return path. irreversible algorithms • Forgetting to obey one of are similar. the transistor rules. • Failing to optimize the • Using traditional models degree of reversibility of of computational a design. complexity. • Ignoring charge leakage • Restricting oneself to an in low-power/adiabatic design. asymptotically inefficient design style. Adiabatic/Reversible Computing Basic Models and Concepts Bistable Potential-Energy Wells • Consider any system having an adjustable, bistable potential energy surface (PES) in its configuration space. (Landauer ’61) • The two stable states form a natural bit. – One state represents 0, the other 1. • Consider now the P.E. well having two adjustable parameters: – (1) Height of the potential energy barrier relative to the well bottom – (2) Relative height of the left and right states in the well (bias) 0 1 Possible Parameter Settings • We will distinguish six qualitatively different settings of the well parameters, as follows… Barrier Height Direction of Bias Force One Mechanical Implementation State knob Rightward bias spring Barrier wedge spring Barrier up Barrier down Leftward bias Possible Adiabatic Transitions • Catalog of all the possible transitions in these wells, adiabatic & not...(Ignoring superposition states.) 1 leak 0 0 0 1 1 leak Barrier Height 0 N Direction of Bias Force 1 “1” states “0” states Ordinary Irreversible Logics • Principle of operation: Lower a barrier, or not, based on input. Series/parallel combinations of barriers do logic. Major 1 dissipation in at least one of the possible transitions. Input changes, barrier lowered 0 0 • Amplifies input signals. Example: Ordinary CMOS logics Output irreversibly changed to 0 Ordinary Irreversible Memory • Lower a barrier, dissipating stored information. Apply an input bias. Raise the barrier to latch the new information Retract 1 into place. Remove input input bias. Dissipation Retract input 0 Example: ordinary DRAM Barrier up 0 (3) Input “0” 0 (2) here can be made as low as kT ln 2 (1) N Barrier up Input “1” (2) 1 1 Input-Bias Clocked-Barrier Logic • Cycle of operation: – (1) Data input applies bias • Add forces to do logic Can amplify/restore input signal in the barrier-raising step. – (2) Clock signal raises barrier – (3) Data input bias removed (3) 1 1 (4) Can reset latch reversibly (4) given copy of contents. (3) 0 0 (2) (4) (4) (4) Examples: Adiabatic QDCA, SCRL latch, Rod logic latch, PQ logic, Buckled logic (2) (1) 0 (4) N (1) (4) 1 Input-Barrier, Clocked-Bias Retractile • Cycle of operation: – (1) Inputs raise or lower barriers • Do logic w. series/parallel barriers • Barrier signal amplified. • Must reset output prior to changing input. • Combinational logic only! – Clock applies bias force, which changes state, or not 0 0 0 (1) Input barrier height Examples: Hall’s logic, SCRL gates, Rod logic interlocks 0 N (2) Clocked force applied  1 Input-Barrier, Clocked-Bias Latching ● Cycle of operation: 1. Input conditionally lowers barrier • Do logic w. series/parallel barriers 2. Clock applies bias force; conditional bit flip 3. Input removed, raising the barrier & (4) locking in the state-change (4) 4. Clock 0 0 bias can 0 (2) (2) retract (1) Examples: Mike’s 4-cycle 2-level adiabatic CMOS logic (2LAL) (2) 0 N (2) 1 (3) 1 Full Classical-Mechanical Model Claim: The following components Sleeve are sufficient for a complete, scalable, parallel, pipelinable, linear-time, stable, classical (a) reversible computing system: (a) Ballistically rotating flywheel driving linear motion. (b) Scalable mesh to synchronize local flywheel phases in 3-D. (b) (c) Sinusoidal to flat-topped waveform shape converter. (d) Non-amplifying signal inverter (NOT gate). (d) (e) Non-amplifying OR/AND gate. (f) Signal amplifier/latch. Primary drawback: Slow propagation speed of mechanical (phonon) signals. (c) (f) (e) cf. Drexler ‘92 Adiabatic electronics & CMOS implementations Conventional Gates are Irreversible • Logic gate behavior (on receiving new input): – Many-to-one transformation of local state! – Required to dissipate bT, by Landauer principle – Incurs ½CV2 dissipation in 2 out of 4 cases. Example: Static CMOS Inverter: in out Transformation of local state: Just before transition: in out 0 0 0 1 1 0 1 1 After transition: in out 0 1 1 0 Exact formula:    Ediss  f 1  f e 1/ f  1  CV 2 for frequency reduction f : RC/t Common Mistakes to Avoid In Adiabatic Design Common Mistakes to Avoid: • Don’t use diodes in charge-return path! – The built-in voltage drop kills adiabaticity. • Don’t disobey adiabatic transistor rules by either: – Turning on transistor with voltage across it – Turning off transistor with current thru it! • This one is often neglected! • Use mostly-reversible logic! – Optimize degree of reversibility for application • Don’t over-constrain the design family! – Asymptotically efficient circuits should be possible Adiabatic Rules for Transistors • Rule 1: Never turn on a transistor if it has a nonzero voltage across it! – I.e., between its source & drain terminals. – Why: This erases info. & causes ½CV2 disspation. • Rule 2: Never apply a nonzero voltage across a transistor even during any onoff transition! – Why: When partially turned on, the transistor has relatively low R, gets high P=V2/R dissipation. – Corollary: Never turn off a transistor if it has a nonzero current going through it! • Why: As R gradually increases, the V=IR voltage drop will build, and then rule 2 will be violated. Adiabatic Rules, continued… • Transistor Rule 3: Never suddenly change the voltage applied across any on transistor. – Why: So transition will be more reversible; dissipation will approach CV2(RC/t), not ½CV2. Adiabatic rules for other components: • Diodes: Don’t use them at all! – There is always a built-in voltage drop across them! • Resistors: Avoid moderate network resistances, if poss. – e.g. stay away from range >10 k and <1 M • Capacitors: Minimize, reliability permitting. – Note: Dissipation scales with C2! Transistor Rules Summarized Legal adiabatic transitions in green. (For n- or p-FETs.) Dissipative states and transitions in red. high high off high low low on low high high off low off low off high low on high low on low on high SCRL: Split-level Charge Recovery Logic The First Pipelined Fully-Adiabatic CMOS Logic (Younis & Knight, MIT, ’94)  Transformation of local state: Just before transition: After transition: in out 0 ½ 1 ½ in out 0 1 1 0 Input-Barrier, Clocked-Bias Retractile * Must reset output prior to input. * Combinational logic only! • Cycle of operation: – Inputs raise or lower barriers • Do logic w. series/parallel barriers – Clock applies bias force which changes state, or not Examples: Hall’s logic, SCRL gates, Rod logic interlocks 0 0 0 Input barrier height 0 N Clocked force applied  1 Retractile Logic w. SCRL gates • Simple combinational logic of any depth N: – Requires N timing phases – Non-pipelined – No sequential reuse of HW (even worse) • We need sequential logic! Time  Sequential Retractile Logic • Approach #1 (Hall ‘92): – After every N stages, invoke an irreversible latch • stores the output of the last stage – Then, retract all the stages, – and begin a new cycle • Problems: – Reduces dissipation by at most a factor of N – Also reduces HW efficiency by order N! • In worst case, compared to a pipelined, sequential circuit • Approach #2 (Knight & Younis, ‘93): – The “store output” stage can also be reversible! – Gives fully-adiabatic, sequential, pipelined circuits! • N can be as small 1 or 2 & still have arbitrarily high Q Simple Reversible CMOS Latch • Uses a standard CMOS transmission gate • Sequence of operation: (1) input initially matches latch contents (output) (2) input changesoutput changes (3) latch closes (4) input removed P in out P Before input: in out a a Input arrived: in out a a b b Input removed: in out a a a b Resetting a Reversible Latch • Can reversibly unlatch data as follows: (exactly the reverse of the latching process) – (1) Data value d stored on memory node M. – (2) Present an exact copy of d on input. – (3) Open the latch (connecting input to M). • No dissipation since voltage levels match – (4) Retract the copy of d from the input. • Retracts copy stored in latch also. Input-Bias Clocked-Barrier Logic • Cycle of operation: Can amplify/restore input signal in clocking step. – Data input applies bias • Add forces to do logic – Clock signal raises barrier – Data input bias removed Can reset latch reversibly given copy of contents. Retract input 0 Examples: Adiabatic QDCA, SCRL latch, Rod logic latch, PQ logic, Buckled logic 1 Clock barrier up 0 Clock up Input “0” 0 1 Retract input N Input “1” 1 SCRL 6-tick clock cycle Initial state: All gates off, all nodes neutral. in out SCRL 6-tick clock cycle Tick #1: Input goes valid, forward T-gate opens. in out SCRL 6-tick clock cycle Tick #2: Forward gate charges, output goes valid. (Tick #1 of subsequent gate.) in out SCRL 6-tick clock cycle Tick #3: Forward T-gate closes, reverse gate charges. in out SCRL 6-tick clock cycle Tick #4: Reverse T-gate opens, forward gate discharges. in out SCRL 6-tick clock cycle Tick #5: Reverse gate discharges, input goes neutral. in out SCRL 6-tick clock cycle Tick #6: Reverse T-gate closes, output goes neutral. Ready for next input! in out 24 ticks/cycle in this versionincludes 2-level retractile stages Some Timing Terminology For sequential adiabatic circuits: • 1 Tick: Time for a single ramp transition – adiabatic speed fraction f times the RC gate delay. • 1 Phase: Latency for a data value to propagate forward by 1 pipeline stage. • 1 Cycle: Minimum period for all timing information to return back to its initial state. • Diadic: Two retractile levels per gate – permits inverting or non-inverting logic. • Dual rail: Two wires per logic value – permits universal logic with monodic gates Monadic: only 1 level Some Figures of Demerit • Some quantities we may wish to minimize: – Ticks/phase: • proportional to logic propagation latency – Ticks/cycle: • reciprocal to rate of data throughput – Transistor-ticks/cycle: • reciprocal to HW cost-efficiency – Number of required clock/power input signals: • supplying these may be a significant component of system cost – Number of distinct voltage levels required: • may affect reliability/power tradeoff Some Interesting Questions • About pipelined, sequential, fully-adiabatic CMOS logic: – Q: Does it require an intermediate voltage level? • A: No, you can get by with only 2 different levels. – Q: What is the minimum number of externally provided timing signals you can get away with? • A: 4 (12 if split levels are used) – Q: Can the order-N different timing signals needed for long retractile cascades be internally generated within an adiabatic circuit? • A: Yes, but not statically, unless N2 hardware is used – where N is the number of stages per full sequential cycle • We now demonstrate these answers. Some Timing Examples See next slide for some detailed timing diagrams. • N-level retractile cascades: – 2N ticks/phase × 1 phase/cycle = 2N ticks/cycle • 3-phase fully-static diadic SCRL – 8 ticks/phase × 3 phases/cycle = 24 ticks/cycle • 2-phase fully-static monadic SCRL – 5 ticks/phase × 2 phases/cycle = 10 ticks/cycle • 2-phase fully-static diadic SCRL – 6 ticks/phase × 2 phases/cycle = 12 ticks/cycle • 6 tick/cycle dynamic SCRL detailed previously: – 1 tick/phase × 6 phases/cycle = 6 ticks/cycle Some SCRL timing diagrams Reversible / Adiabatic Chips Designed @ MIT, 1996-1999 By the author and other then-students in the MIT Reversible Computing group, under AI/LCS lab members Tom Knight and Norm Margolus. 2LAL: 2-Level Adiabatic Logic A Novel Alternative to SCRL 2LAL: 2-level Adiabatic Logic (Implementable using ordinary CMOS transistors) P simplified T-gate symbol: • Use • Basic buffer element: – cross-coupled T-gates • Only 4 timing signals, 4 ticks per cycle: : 1 in 0 – i rises during tick i – i falls during tick (i+2) mod 4 out P 0 1 2 3 Tick # 0 1 2 3 P 2LAL Cycle of Operation Tick #0 Tick #1 in1 in Tick #2 11 in0 Tick #3 10 out1 01 in=0 01 00 11 out0 out=0 00 2LAL Shift Register Structure • 1-tick delay per logic stage: 1 2 3 0 in out 0 1 2 3 • Logic pulse timing & propagation: 0 1 2 3 ... in in 0 1 2 3 ... More complex logic functions • Non-inverting Boolean functions:   A B A A B AB AB • For inverting functions, must use quad-rail logic encoding: – To invert, just swap the rails! • Zero-transistor “inverters.” A=0 A0 A0 A1 A1 A=1 Hardware Efficiency issues • Hardware efficiency: How many logic operations per unit hardware per unit time? • Hardware spacetime complexity: How much hardware for how much time per logic op? • We’re interested in minimizing: (# of transistors) × (# of ticks) / (gate cycle) • SCRL inverter, w. return path: – (8 transistors)  (6 ticks) = 48 transistor-ticks • Quad-rail 2LAL buffer stage: – (16 transistors)  (4 ticks) = 64 transistor-ticks More SCRL vs. 2LAL • SCRL reversible NAND, w. all inverters: – (23 transistors)  (6 ticks) = 138 T-ticks • Quad-rail 2LAL AND: – (48 transistors)  (4 ticks) = 192 T-ticks • Result of comparison: Although 2LAL minimizes # of rails, and # ticks/cycle, it does not minimize overall spacetime complexity. • The question of whether 6-tick SCRL minimizes per-op spacetime complexity among pipelined adiabatic CMOS logics is still open. Minimizing Power-Clock Signals • How many external clock signals required? – N-level-deep retractile cascade logic: • 2N waveforms × 1 phase = 2N signals – 6 tick/cycle, 6-phase dynamic SCRL: • 6 waveforms × 6 phases = 36 signals – 24 tick/cycle, 3-phase static SCRL: • 12 waveforms × 3 phases = 36 signals – 4 tick/cycle, 2LAL: • 1 waveform × 4 phases = 4 signals! • It turns out that 12 signals are sufficient to implement any combination of 2-level or 3level logics (including retractile) on-chip! How to Do It • Circular 2LAL shifter; pulse-gated clocks P1 0 P2 P3 P0 in out P0 P1 2 2 P2 2 P3 P0 P1 P2 P3 0 1 2 3 Tick # 0 1 2 3 • 12-rail system: pros & cons Pros: – Completely solves adiabatic timing design problem – Enables mixtures of retractile, SCRL, and other logic styles on 1 chip – Enables simple fully-adiabatic SRAM & DRAM • Cons: – Timing signals are dynamic – Known fully-static alternatives use order N2 gates and signals for N-tick-long cycles • N can be large in a chip that includes deep retractile networks – Energy waste in driving the source/drain junction capacitances of all the T-gates even when timing pulse isn’t present • SOI reduces these parasitics GCAL: General CMOS Adiabatic Logic • A general CMOS adiabatic design methodology – Currently under development at UF • Combines best features of SCRL, 2LAL, and retractile logics: – Permits designs attaining asymptotically optimal cost-efficiency • For any combination of time, space, spacetime, energy costs – Arbitrarily high degree of reversibility – Permits using minimal 2-level and 3-level adiabatic gates – Requires only 4 externally supplied clock/power signals for 2-level logic • And only 12 total for mixed 2-level + 3-level logic – Supports mixtures of fully-pipelined and retractile logic. – Supports quiescent dynamic/static latches & RAM cells • Tools currently under development: – A new HDL specialized for describing adiabatic designs – Digital circuit simulator with adiabaticity checker – Adiabatic logic synthesis tool, with automatic legacy design converter GCAL DRAM/SRAM cells • GCAL DRAM cell – 4 transistors – 4 word lines/row – 2 bit lines/col (or 1) • GCAL SRAM cell – 8 transistors – 6 word lines/row – 2 bit lines/col (or 1) DRAM Cell Write Cycle 1. All nodes initially ½. • T-gate initially closed (off). 2. Transmission gate opens. • Internal node is connected to bit-line (at matching voltage). 3. Bit line transitions to 0 or 1. • Pulls internal node to matching level. 4. Transmission gate closes. • Internal node latched to new level. 5. Bit line transitions back to ½. • Prepares for a new cycle. Use the reverse sequence of operations to unwrite. DRAM Cell Read Cycle 1. All external nodes initially ½. • • T-gate initially off. Internal node contains data. 2. Inverter rails split. • Bit line set to (inverted) data. 3. T-gate at end of column latches bit-line data. 4. Inverter rails merge. • Bit line restored to ½ level. Can use the reverse sequence of operations to unread copy of data available at end of column. Fully-Adiabatic DRAM cell • 6T, 6 lines/row, 1 line/column (in/out together) • Read cycle: – – – – – Initially:  lines neutral, out neutral, R off R for desired row turns on  for desired row splits, driving out column R turns off, out is read  merges, out is reset • Write cycle: – – – – First, do read cycle. in is set to out W turns on in changed to new value... Fully-Adiabatic SRAM • 10-T, 10 lines/row, 1 line/column • Operation similar to DRAM, except: • Read-out: T2 off; N2 retracts; T3 on; N2 asserts; T2 on, T3 off • Write: T2 off; N2 retracts; N1 retracts, copy of M presented on input; T1 on; in changes; T1 off, N1 N1 N2 asserts; N2 asserts; T2 on T1 in M T2 T3 out Limits of Adiabatics Structured Systems • A structured system is defined as a system about whose state we have some knowledge. – Some of its physical information is known. –  Its entropy is not at a maximum (by defn.). –  It is not at equilibrium (by defn.). • For states with a given energy E, – we say the system’s energy is distributed among those states, in proportion to their probability. The system’s energy is “in here” States w. prob. > 0 All states of the abstract system having energy E Desired Trajectories Time • Any structured system we build to serve some purpose ConfigDesired trajectories has some uration desired trajectory, or set of trajectories, through its configuration space that we would ideally like it to follow at all times. – Think of any given state as having a specific “desirability” at any given time. Energy Losses • Energy dissipation can be viewed as a departure of part of the system’s energy away from the system’s desired trajectory. Time 6 • E.g., 1 of 10 electrons leaks out of a ConfigDRAM cell = system’s energy has uration departed from desired Energy that has trajectory (all 106 stay) departed from desired by a small amount trajectories Limits of Adiabatics I: Friction Generalized Friction • Any force leading to departure from desired trajectory that obeys the adiabatic principle: – I.e., force strength (& total energy loss) is proportional to velocity along trajectory at low velocities • Examples: – – – – – Ordinary sliding friction Fluid viscosity Electrical resistance Forces causing electromagnetic radiative losses Forces causing losses in inelastic collisions Ways to Quantify Friction Normal friction measures referring to length, mass, etc. may not apply to all processes. For a given mechanism executing a specified process (i.e., following a specified desired trajectory or -ies) over a time t: • Energy coefficient: cE = Elost·t = Elost/q – Energy dissipated from traj. per unit of “quickness” • Note quickness q = 1/t has units like Hz • Entropy coefficient: cS = Smade·t = Smade/q – New entropy generated per unit of quickness • Note that cE = cS·T at temperature T. What matters! Energy Coefficient in Electronics • For charging capacitive load C by voltage V through effective resistance R: cE = Elostt = (CV2RC/t)t = C2V2R • If the resistances are voltage-controlled switches with gain factor k controlled by the same voltage V, then effective R  1/kV cE = C2V/k • In constant-field-scaled CMOS, k  1/dox  , C  , and V  , so cE  3/ = 4; Elost = cE/t  4/ = 3 (like CV2 energy) Degree of Reversibility of CMOS • What is the Q of a min-size CMOS transistor? Q = Efree/Ediss = Efree/(cE/t) = ½CV2/(C2V2R/t) = ½(t/RC) =½s (s = slowdown factor) • Note: Using transistors wider than minimumsize (larger C, smaller R) wouldn’t change RC or Q, and would increase overall dissipation by increasing cE. Lower Bounds on Friction? • No general (technology-independent) lower bounds on friction coefficients for interesting types of processes (e.g. computation) are currently known. • Clever engineering may eventually reduce the friction in desired processes to values as small as is desired. • Some ways: – Reduce number of moving parts (or particles) – Isolate “moving parts” of system from unwanted interactions w. environment Entropy coefficients of some reversible logic gate operations From Frank, “Ultimate theoretical models of nanocomputers” (Nanotechnology, 1998): • SCRL, circa 1997: ~1 b/Hz • Optimistic reversible CMOS: ~10 b/kHz • Merkle’s “quantum FET:” ~1.2 b/GHz • Nanomechanical rod logic: ~.07 b/GHz • Superconducting PQ gate: ~25 b/THz • Helical logic: ~.01 b/THz How low can you go? We don’t really know! Is Adiabatic Limit Achievable? • Even if there is some lower bound on cS, it seems we can have Smade 0 as t  . • What factors may prevent this? – Any lower bound >0 on the number of irreversible bit-operations performed. (Each has Smade  1.) • Fortunately, the lower bound can always be made 0. – Any lower bound on the rate of energy leakage, even when system is completely stopped. – Any upper bound on the Q of the clocking & synchronization system. • The system dissipates Efree/Q on every cycle. • No technology-independent upper bounds on Q known Some Synonyms • Leakage of energy or (equivalently) probability mass out of a desired configuration or trajectory. • Occurrence of errors in the desired analog or digital state of a system. (Motion away from desired states.) • Decay of structure of a structured system. (The state departs from desired state.) Leakage = Error = Decay Perfect Mechanisms? • If a structured system is perfectly closed, – I.e. non-interacting with other systems, at all! – And if its internal interactions are perfectly known, • Then, and only then, is its (von Neumann) entropy going to be a constant. • Otherwise, its entropy will continuously increase as we lose track of its state. – In this case, no mechanism is perfect, in that some of its energy (i.e. some probability mass) is always leaking away from the desired trajector(y/ies) at some nonzero base rate, even when the rate of system’s progress along its trajectory is zero. Leakage Limits • Claim: No real, structured system can have absolutely zero rate of energy leakage out of its desired trajectories, even if not moving. • However: No general, Time technology-independent lower bound on Configleakage rates uration is known (other than zero.) Energy that has • Engineering advances might leaked from desired make leakage as small as desired. configuration Quantifying Leakage • For a given structured system: • Leakage power: Pleak = dEleak / dt • Spontaneous entropy generation rate: • Sleak = dSleak / dt • • Again, note Pleak = Sleak · T at temperature T. Ways to Decrease Leakage • Have high potential-energy barriers – slows down thermally excited leakage exponentially • Have thick potential-energy barriers – slows down quantum tunneling exponentially • Example: Older generations of CMOS! • Mechanical (clockwork) systems have high potential energy barriers, for their size: – Decay may require atoms to diffuse out of tightlybonded spots. – Mechanisms that avoid making/breaking contacts (e.g. buckled logic) avoid losses due to stiction. Limits of Adiabatics II: Leakage Minimum Losses w. Leakage topt  Pleak Sleak  cE cS Etot = Eadia + Eleak Eleak = Pleak·tr  2 Pleak cE  2T Sleak cS Eadia = cE / tr Minimum Loss Derivation d Etot d( Eadia  Eleak ) d(cE / t r ) d( Pleak  t r )    d tr d tr d tr d tr  cE / t  Pleak 2 r  0 (when Etot is minimum) cE  Pleak  t 2 r t  cE / Pleak 2 r t r  cE / Pleak Emin  cE / t r  Pleak  t r  cE Pleak / cE  Pleak  cE / Pleak  Pleak cE  Pleak cE  2 Pleak cE Leakage in CMOS • In a given technology with constant-field scaling, leakage becomes worse at smaller scales because: – Energy barriers between states are lower • Higher rates of thermally-induced leakage, at given T • Higher rates of quantum tunneling (temp.-independent) – Energy barriers between states are narrower • Higher rates of quantum tunneling – These effects get worse exponentially with 1/length (doubly-exponentially with time) • Need alt. technologies w. high energy barriers! Future Techs. w. Low Leakage? • How can we achieve low entropy coefficients in minimum-scale (atomic-scale) devices? – Need high energy barriers: • Can achieve using atomic (not just electronic) interactions: – E.g. mechanical logics (rod logic, buckling logic) – If strong bonds (e.g. C-C) are used in structure, rates of unwanted bond breakage can made be very low. – Rate for an atom passing through another one (e.g. knobs in rod logic) is extremely low due to » height of barrier: strength of Coulombic & fermionic repulsion between electrons, & » width of barrier: large number of particles involved • Other possibilities? Minimum Dissipation with Variable V Emin  2 Plk cE 2  1 2 Plk  I lkV  kV 3e V / T 2C 2V / k  4 V / T 2 C V e 2 2 V / 2T  2CV e • Notice that this function of V approaches 0 exponentially as V→∞, – This is even true if we scale CV! • Thus, there is no lower limit to the energy dissipation of adiabatic field-effect circuits! – The key is to make devices larger, not smaller! • Device sizes need grow only logarithmically. 3 V / T  kV e 1 2 I lk  I max e V / T  12 kV 2 e V / T cE  C 2V 2 R  2C 2V / k R  V / I max  2 / kV I max  12 kV 2 Maximum Q factor in terms of V • The maximum logic Q factor is the maximum ratio between the energy involved in carrying out a logical transition, and the energy dissipated by the circuit during the transition. – We just calculated the minimum energy dissipated. • Thus, Qmax = Einvolved/Ediss,min = ½CV2/(2CV2exp(−V/2φT)) = (1/4)exp(V/2φT) – Note that the maximum logic Q-factor goes up exponentially with the logic-swing voltage V. Minimum energy & Roff/Ron ratio • (A simpler version of earlier derivation.) Note that: cE = C2V2Ron and if the dominant leakage mode is source/drain, then: Pleak = V2/Roff • So putting the two together: cEPleak = C2V4(Ron/Roff) Emin = 2(cEPleak)1/2 = 2CV2(Ron/Roff)1/2 • So we can rederive the maximum logic Q as follows: Qmax = ½CV2 / (2CV2(Ron/Roff)1/2) = ¼(Roff/Ron)1/2 = ¼(Imax/Ileak)1/2 = ¼(ron/off)1/2 Limits of Adiabatics III: Clock/Power Supplies See transparencies. Timing in Adiabatic Systems When multiple adiabatic devices interact, the relative timing must be precise, in order to ensure that adiabatic rules are met. • There are two basic approaches to timing: – Global (a.k.a. clocked, a.k.a. synchronous) timing: • This is the approach in nearly all conventional irreversible CPUs. • Also is the basis for all practical adiabatic and quantum computing mechanisms that have been proposed to date. – Local (a.k.a. self-timed, a.k.a. asynchronous) timing: • Implemented in a few commercial irreversible chips. • Feynman ‘86 showed that a self-timed serial reversible computation was implementable in QM, in principle. • Margolus ‘90 extended this to a 2-D model with 1-D of parallelism. Can it still work in a full 3-D, fully-quantum-mechanical? – Indications from considering classical-mechanical 3D meshes of coupled oscillators is “yes.” Global Timing • Examples of adiabatic systems designed on the basis of global, synchronous timing: – – – – – Adiabatic CMOS with external power/clock rails Superconducting parametric quantron (Likharev) Adiabatic Quantum-Dot Cellular Automaton (Lent) Adiabatic mechanical logics (Merkle, Drexler) All proposed quantum computers • A potential problem: Synchronous timing may not scale well to large machine sizes. – Work by Janzig & others raises issues of possible limits on timing systems due to quantum uncertainty. • Issue is still unresolved. Clock/Power Supply Desiderata • Here are some requirements for a good adiabatic timing signal / power supply for driving voltage-coded logic: – Generate a trapezoidal voltage waveform with very flat high/low regions. • Needed to avoid current through transistors when turning them off • The flatness of the signal limits the maximum Q factor of the logic. • Waveform during the high↔low transitions should ideally be linear, – But this does not affect the maximum logic Q, only the energy coefficient. » So long as ramp slope scales down everywhere with transition time. – Operate resonantly with the logic circuit, with a high Q factor. • The power supply’s Q will limit the overall system Q – If possible, scale Q  t (cycle time) • Required to be considered an adiabatic mechanism. • May conflict w. inductor scaling laws! • At the least, Q should still be high at leakage-limited speed – Have a reasonable cost, compared to the logic it powers. – Be scalable to large meshes of mutually synchronized devices. Supply concepts in my research • Superpose several sinusoidal signals from phase-synchronized oscillators at harmonics of fundamental frequency – Weight these frequency components as per Fourier transform of desired waveform • Create relatively high-L integrated inductors via vertical, helical metal coils – Only thin oxide layers between turns • Use mechanically oscillating, capacitive MEMS structures in vacuo as high-Q (~10k) oscillator – Use geometry to get desired wave shape directly Early supply concepts • Inductors & switches. – See transparency. • Stepwise charging. – See transparency. Newer Supply Concepts • Transmission-line-based adiabatic resonators. – See transparency. • MEMS-based resonant power supply – See next couple of slides • Ideal adiabatic supplies - Can they exist? – Idealized mechanical model: See transparency. – But, there may be quantum limits to reusability/scalability of global timing signals. • This is a very fundamental issue! MEMS/NEMS Resonators A Novel Clock/Power Supply Technology for Adiabatic Circuits MEMS/NEMS Resonators • State of the art of technology demonstrated in lab: – Frequencies up to the 100s of MHz, even GHz – Q’s >10,000 in vacuum, several thousand even in air! • Rapidly becoming technology of choice for commercial RF filters, etc., in communications SoC (Systems-ona-Chip) e.g. for cellphones. U. Mich., poly, f=156 MHz, Q=9,400 34 µm A MEMS Supply Concept • Energy stored mechanically. • Variable coupling strength → custom wave shape. • Can reduce losses through balancing, filtering. Limits of Adiabatics: Summary Some final remarks. Summary of Limiting Factors When considering adiabaticizing a system, ask: • What fraction of system power is in logic? fL – Vs. Displays, transmitters, propulsion. • What fraction of logic is done adiabatically? fa – Can be all, but w. cost-efficiency overheads. • How large is the Ion/Ioff ratio of switches? – Affects leakage & minimum adiabatic energy. • What is the Qsup of the resonant power supply? • What is the relative cost of energy / logic? r$ – E.g. decreasing power cost by r$ by increasing HW cost by  r$ will not help. “Power premium” Minimizing cost/performance • • • • • $P = Cost of power in original system $H = Cost of logic HW in original system $P = r$$H; $H = $P/r$ For cost-efficiency inverse to energy savings: $tot,min = $Pr$−1/2 + $Hr$1/2 = 2 $Pr$−1/2 • $tot,orig = $P + $H = (1+r$)$H = ((1+r$)/r$) $P • $tot,orig/$tot,min = ½(1+r$)r$−1/2  ½r$1/2 for large r$ Summary of adiabatic limits • Cost-effective adiabatic energy savings factor: Sa = Econv / Eadia in cost-effective adiabatic system • Some rough upper bounds on Sa: Sa  ~ 1/(1fL) Sa  ~ 1/(1fa) Sa  ~ ¼(ron/off)1/2 Sa  Qsup Sa  ~ r$1/2 (worse for non-ideal apps) • But, this ignores benefits from adiabatics of denser packing & smaller communications delays in parallel algorithms. (More later.)

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Principles of Computer Architecture Dr. Mike Frank