Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Integrated Circuit Design with NEM Relays Fred Chen2, Hei Kam1, Dejan Markovic3, Tsu--Jae King Liu1, Vladimir Stojanovic2, Elad Alon1 Tsu 1Department of EECS, University of California, Berkeley, CA 2Department of EECS, Massachusetts Institute of Technology, Cambridge, MA 3EE Department, University of California, Los Angeles, CA CMOS is Scaling, Power Density is Not Power Density Prediction circa 2000 1000 10.1 100 Sun’s Surface Rocket Nozzle Nuclear Reactor Normalized Energy/op Power Density (W/cm2) 10000 100 8086 Hot Plate 10 4004 P6 80088085 386 Pentium® proc 286 486 8080 1 1970 1980 1990 2000 2010 Year S. Borkar 80 60 40 20 0 0 10 1 10 2 3 10 10 1/throughput (ps/op) 4 10 Vdd and Vt not scaling well power/area not scaling ICCAD 2008 – November 12, 2008 2 CMOS is Scaling, Power Density is Not Power Density Prediction circa 2000 1000 10.1 100 Sun’s Surface Rocket Nozzle Nuclear Reactor Normalized Energy/op Power Density (W/cm2) 10000 100 Core 2 8086 Hot Plate 10 4004 P6 80088085 386 Pentium® proc 286 486 8080 1 1970 1980 1990 2000 2010 Year S. Borkar 80 60 Operate at a lower energy point 40 20 0 0 10 Run in parallel to recoup performance 1 10 2 3 10 10 1/throughput (ps/op) 4 10 Vdd and Vt not scaling well power/area not scaling Parallelism to improve performance within power budget ICCAD 2008 – November 12, 2008 3 Where Parallelism Doesn’t Help 25 100 20 80 15 Etotal 10 Edynamic 5 Eleak 0.1 0.2 0.3 Vdd (V) 0.4 0.5 60 Scale Vdd & VT: 40 20 0 1 10 2 10 3 4 10 10 1/throughput (ps/op) 5 10 CMOS circuits have wellwell-defined minimum energy 10.1 Energy/op vs. 1/throughput Normalized Energy/op Normalized Energy/cycle Energy/op vs. Vdd Caused by leakage and finite sub-threshold swing Need to balance leakage and active energy Limits energy-efficiency, regardless how slow the circuit runs ICCAD 2008 – November 12, 2008 4 What if There Was No Leakage? Normalized Energy/cycle in CMOS Measured relay I-V 1.2 W x L = 2mm x 20mm 1.0 20 15 IDS (mA) Normalized Energy/cycle 25 Etotal 10 0.6 VDS = 1V 0.0 34 0.1 10.1 0.2 0.3 Vdd (V) 0.4 compliance limit 0.4 0.2 Edynamic 5 H = g = 0.6mm 0.8 0.5 35 36 VGB (V) 37 No longer need to balance increasing component of energy as Vdd decreases Relays offer near infinite subsub-threshold slope and no leakage current ICCAD 2008 – November 12, 2008 5 Outline If we could reliably build relays, would circuits implemented using relays offer superior energy--efficiency to CMOS? energy Compact Circuit Model for NEM Relays Circuit Design Paradigms for NEM Relays How to deal with ―slow‖ relays Compare vs. CMOS 10.1 Fast yet key functionality captured Digital Logic: 32-bit adder Mixed Signal I/O: DAC, ADC ICCAD 2008 – November 12, 2008 6 NEM Relay Structure & Operation Schematic Cross-Section Electrical Gate Metal Channel Gate Dielectric W HH Source t gap Closed Relay (“ON”): |Vgb| > Vpi (pull-in voltage) Air gapH Drain Drain tdimple Body Dielectric Bodyody Body Plan-View L Source Open Relay (“OFF”): |Vgb| < Vpo (pull-out voltage) Anchor Electrical Gate Body Electrode (under body dielectric) W Drain Channel Contact Dimples 10.1 Relatively low on-state resistance (100Ω < Ron < 1kΩ @ 90nm) Infinite off-state resistance Zero off-state leakage ICCAD 2008 – November 12, 2008 7 NEM Relay Model Anchor Mechanical Model Spring k Mass m Damper b C 0.5Rc Cs S Rsd 0.5Rc Cd Go Cgc Rcs So Cgb D Rcd Rsd Do Csd_b Csd_b B Compact Verilog Verilog--A model for circuit simulation: 10.1 Mechanical dynamics: spring (k), damper (b), mass (m) Electrical parasitics: non-linear gate-body (Cgb), gate-channel (Cgc), and source/drain-body cap (Csd_b), contact resistance (Rcs,d) Vpi and delay verified against device measurements ICCAD 2008 – November 12, 2008 8 NEM Relay Scaling Constant EE-field scaling has similar benefits as CMOS Pull-in Voltage with Beam Scaling Measured pullpull-in voltages scale linearly If we can overcome reliability issues & surface forces scale: {W,L,tgap} = {90nm,2.3um,10nm} Vpi = 200mV 10.1 Mechanical delay also scales linearly (~10ns @90nm) ICCAD 2008 – November 12, 2008 9 NEM Relay as a Logic Element 4-terminal design mimics MOSFET operation Electrostatic actuation is ambipolar Non Non--inverting logic possible: actuation independent of drain/source voltages Mechanical delay (~10ns) >> electrical t (~1ps) Can stack 200 devices (with 1kW on-resistance) before electrical delay is comparable to mechanical delay 10.1 ICCAD 2008 – November 12, 2008 10 Digital Circuit Design with Relays CMOS: delay set by electrical time constant Relays: delay dominated by mechanical movement 10.1 Quadratic delay penalty for stacking devices (pass transistor logic) Buffer & distribute logical/electrical effort over many stages Want all relays to switch simultaneously Implement logic as a single complex gate ICCAD 2008 – November 12, 2008 11 Digital Circuit Design with Relays 4 gate delays Delay Comparison vs. CMOS Single mechanical delay vs. several electrical gate delays For reasonable loads, relay delay unaffected by fan-out/fan-in Area Comparison vs. CMOS 10.1 1 mechanical delay Larger individual devices Fewer devices needed to implement the same logic function ICCAD 2008 – November 12, 2008 12 CMOS vs. Relays: Digital Logic Normalized Energy/op 100 80 60 40 20 0 1 10 2 10 3 4 10 10 1/throughput (ps/op) 5 10 Most processor components exhibit the energy vs. performance tradeoff of static CMOS Control, Datapath Datapath,, Clock 10.1 Adder energy performance tradeoff is representative ICCAD 2008 – November 12, 2008 13 Relay--based Adder Relay Full adder cell: 10.1 12 relays vs. 24 transistors XOR for free Use fully complementary signals to avoid additional mechanical delay (to invert) Relays all sized minimally ICCAD 2008 – November 12, 2008 14 N-bit RelayRelay-based Adder 10.1 Ripple carry configuration Cascade full adder cells to create larger complex gate Stack of N relays— relays— still 1 mechanical delay ICCAD 2008 – November 12, 2008 15 Snapshot: NEM Relays vs. CMOS Energy/op vs. Delay/op across Vdd 9x 10x 32-bit adder CMOS Relay Supply Voltage 0.5 V 0.32 - 0.9 V Load Cap per Output 25 fF 25 fF Total Gate Cap 4.0 pF 125 fF 600 mm2 480 mm2 Area Energy is for adder only Compare vs. Sklansky CMOS adder1 For similar area: >9x lower energy/op, >10x greater delay 1D. 10.1 Patil et. al., “Robust Energy-Efficient Adder Topologies,” in Proc. 18th IEEE Symp. on Computer Arithmetic (ARITH'07). ICCAD 2008 – November 12, 2008 16 Energy--Delay Results Energy Relays always provide a more energy efficient solution > 10x better, even in the GOPS range Energy/op vs. Delay/op across Vdd & CL 30x capacitance gain 2.4x Vdd gain 10.1 ICCAD 2008 – November 12, 2008 Lower device Cg, Cd Fewer devices No leakage energy 17 Area vs. Throughput Tradeoff 30 W Relay (buffered, 25fF) A Relay /A CMOS 25 20 W Relay (buffered, 100fF) 15 Au Relay (25fF) 10 Au Relay (100fF) 5 Erelay = 20fJ/op 0.5 2.5 3 For high throughputs, relays require area overhead to achieve similar throughputs as CMOS Area overhead is bounded (max. 6x6x-25x) : 10.1 1 1.5 2 Throughput (GOPS) To maintain minimum energy/op for throughputs > 1GOPS, CMOS needs parallelism too ICCAD 2008 – November 12, 2008 18 CMOS vs. Relays: Mixed Signal I/O energy dominant if core is ~30x more efficient See if I/O circuits can benefit as well Representative examples: DAC & ADC How to process/generate analog signals? No ―linear gain‖ in relays – rely on switching 10.1 ICCAD 2008 – November 12, 2008 19 NEM Relay Based DAC DAC Architecture Voltage Output AC Impedance EBIT Need to add passive R to set impedance Don’t have to buffer up or size relays to drive output Relays’ gate voltage independent of I/O voltage Energy dominated by I/O energy: 10.1 2 VIO2 CL Same topology as CMOS except: @VIO=200mV, CL=1pF: 62.8fJ/bit vs. ~780fJ/bit for CMOS ICCAD 2008 – November 12, 2008 20 NEM Relay Based Flash ADC Topologically similar to CMOS Key differences… 10.1 Sample/Hold input Flash Converter – bank of comparators Sample & Hold: Unbalanced ―on‖ vs. ―off‖ switching times Comparators: Body terminal used as threshold, ambipolar actuation, hysteretic switching ICCAD 2008 – November 12, 2008 21 Relay Based Comparators Body terminal used to set threshold VT Hysteretic switching impact 10.1 Comparator will have different threshold if previous state varies Need to reset all comparators to the same state before each evaluation dynamic logic ICCAD 2008 – November 12, 2008 22 Relay Based Comparators 60 50 Output Code Max 40 30 VT Min 20 10 0 -0.2 0.8 1 Comparators above & below input will evaluate need a different decoder Each comparator has a ―dead―dead-zone‖ where it stays off 10.1 0.2 0.4 0.6 Input Voltage (V) Ambipolar actuation impact 0 Can use ―Max‖ or ―Min‖ of this range to determine code ICCAD 2008 – November 12, 2008 23 ADC Results Vcore = 0.3V C = 500fF R = 4kW VREF = 1V fsamp = 10 MS/s 6-bit res: 5.5fJ/conv Energy dominated by resistor string (320fJ out of 350fJ) 10.1 Largely dependent on input dynamic range (VREF), not core voltage FOM improves with more resolution as resistor string still dominates Most advantages stem from lower Ron at smaller Cg’s ICCAD 2008 – November 12, 2008 24 Conclusions NEM relays offer unique characteristics Superior Ion/Ioff ratio vs. CMOS Superior Ron*Cg vs. CMOS Switching delay largely independent of electrical t Relay circuits show order of magnitude energy efficiency improvements over CMOS 10.1 Use parallelism (and area) to achieve comparable throughputs in digital circuits Mixed signal circuits see similar benefits from low Ron at small Cgs Key challenge is scaling devices reliably ICCAD 2008 – November 12, 2008 25 Acknowledgements 10.1 Berkeley Wireless Research Center NSF DARPA FCRP (C2S2) Intel Foundation Graduate Fellowship ICCAD 2008 – November 12, 2008 26