Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Looking Beyond CMOS: Integrated Circuit Design with Nano Electro-Mechanical Switches Vladimir Stojanović (MIT) in collaboration with Tsu-Jae King Liu, Elad Alon (UC Berkeley) Dejan Marković (UCLA) MIEL 2010 Acknowledgements Circuit design Device design Fred Chen, Hossein Fariborzi Matthew Spencer, Abhinav Gupta Cheng Wang, Kevin Dwan Hei Kam, Rhesa Nathanael, Vincent Pott, Jaeseok Jeon Sponsors DARPA NEMS program FCRP (C2S2, MSD) MIT CICS Berkeley Wireless Research Center NSF 2 CMOS is Scaling, Power Density is Not Power Density Prediction circa 2000 Power Density (W/cm2) 10000 100 Rocket Nozzle Nuclear Reactor Core 2 8086 Hot Plate 10 4004 P6 80088085 386 Pentium® proc 286 486 8080 1 1970 1980 1990 2000 2010 Year S. Borkar, Intel Since ~2000 supply voltage (Vdd) stuck at ~1V 1000 Sun’s Surface Leakage stops you from lowering threshold (Vth) Vdd and Vt not scaling well power/area not scaling Performance limited by power, not number of transistors 3 Parallelism to the Rescue IBM CELL processor Lower supply, balanced Vth Parallelize to recover performance Parallelism allows slower, more efficient units Helps scale performance for fixed power Will this last forever? 4 4 Subthreshold leakage: Game Over for CMOS Leakage and sub-threshold slope define minimum energy/op for CMOS Parallelism cannot reduce power if already operating at minimum energy 5 NEM Switches to the Rescue Measured MEM Switch I-V Curve MEM Switch Energy vs. VDD Etotal=Edynamic NEM switches show zero leakage & sharp sub-threshold slope Could potentially enable reduced E/op with scaling R. Nathanael et al., “4-Terminal Relay Technology for Complementary Logic,” IEDM 2009 6 NEM Switch Structure and Operation Tungsten Channel Tungsten Body Poly-SiGe Gate Poly-SiGe Anchor Poly-SiGe Beam /Flexure Tungsten Source/Drain OFF Switch: |Vgb| < Vpo (pull-out voltage) ON Switch: |Vgb| > Vpi (pull-in voltage) 7 NEM Switch as a Logic Element Gate Source 4-terminal design mimics MOSFET operation Body Drain Electrostatic actuation is ambipolar Non-inverting logic is possible Actuation independent of source/drain voltages 8 8 NEM Switch Model Anchor Spring k Mechanical Model Damper b Mass m G Cgc S D Rs Cgb Rcs Rcd Rd Cdb Csb B Lumped Verilog-A model for circuit design and simulation: Mechanical dynamics: spring (k), damper (b), mass (m) Electrical parasitics: non-linear gate-body (Cgb), gate-channel (Cgc), and source/drain-body cap (Cs,db), contact (Rcs,d) 9 9 NEM Switch Characteristics tPI[s] Model 1.E-06 Model Experiment L=50mm 40mm 14mm 10 15 1.E-07 0 5 VDD [V] Simple electro-mechanical model matches experiment well Mechanical “pull-in” delay scales with geometry and overdrive voltage “Pull-in” voltage (threshold) scales with switch geometry Constant E-field scaling Mechanical delay and VPI scale linearly *H. Kam et al., “Design and Reliability of a Micro-Relay Technology…,” IEDM 2009 10 10 Digital Circuit Design with NEMS NEMS: 12 switches CMOS: delay set by electrical time constant Quadratic delay penalty for stacking devices Buffer & distribute logical/electrical effort over many stages NEMS: delay dominated by mechanical movement Can stack ~100-200 devices before td,elec ≈ td,mech So, want all to switch simultaneously Implement logic as a single complex gate 11 Need to Compare at Block Level NEMS: 12 switches 4 gate delays Delay Comparison vs. CMOS 1 mechanical delay Single mechanical delay vs. several electrical gate delays For reasonable load, NEMS delay unaffected by fan-out/fan-in Area Comparison vs. CMOS Larger individual devices But often need fewer devices to implement same function F. Chen et al., “Integrated Circuit Design with NEM Relays,” ICCAD 2008 12 Example: NEMS Adder Full adder cell: 12 NEMS vs. 24 transistors XOR “free” Complementary signals avoid extra mechanical delay (to invert) NEMS all sized minimally 13 N-bit NEMS Adder Ripple carry configuration Cascade full adder cells to create larger complex gate Stack N NEMS, but still single mechanical delay Performance gap to CMOS shrinks to ~10x for 32-bit add (10ns delay at 90nm) 14 Scaled NEMS vs. CMOS Adders Energy/op vs. Delay/op across Vdd 9x Compare vs. Sklansky CMOS adder* 30x less capacitance 10x 2.4x lower Vdd Lower device Cg, Cd Fewer devices No leakage energy For similar area: >9x lower E/op, >10x greater delay F. Chen et al., “Integrated Circuit Design with NEM Relays,” ICCAD 2008 *D. Patil et. al., “Robust Energy-Efficient Adder Topologies,” in Proc. 18th IEEE Symp. on Computer Arithmetic (ARITH'07). 15 Parallelism helps Energy/op vs. Delay/op across Vdd & CL Can extend energy benefit up to GOP/s throughput As long as parallelism is available Area overhead bounded CMOS needs to be parallelized at some point too 16 16 Contact Resistance Energy/op vs. Delay/op across Vdd & CL Low contact R not critical Good news for reliability… 17 NEMS Circuit Demonstration Test Devices Logic Timing Elements Latches, FFs Memory Adders, Compressors SRAM, DRAM I/O ADC, DAC F. Chen et al., “Demonstration of Integrated Micro-Electro-Mechanical Switch Circuits for VLSI Applications,” ISSCC 2010. 18 Things We Learned… Layout Matters Unbalanced current flow: flexures burn up! Parasitic capacitors: can affect Vpi (DIBL-like effect) 19 Measured Inverter VTC VTC looks digital, suggests composability 20 NEMS Latch Shows Composability Designed as if we used MOSFETS, but that is not always optimal… 21 Switch-based Carry Generation Demonstrates propagate-generate-kill logic as a single complex gate 22 Measured DRAM Results Simultaneous read and write Read latency = tmech,on (decoder) + tmech,off (WLRD switch) 23 Measured DAC Results Code = “Vin” 1 1 Code = 0 0 “Vin” Resistive divider based DAC 2-bit thermometer coded output 2424 NEM switch scaling – next design 1um litho NEM switch size 120um x 150um 0.25um litho Scaled NEMS size 20um x 20um 25 Conclusions NEMS unique features enable scaling post-CMOS t Reliability improving Nearly ideal Ion/Ioff Switching delay largely independent of electrical Need to adapt circuit design style Circuit level insights critical (contact R) Demonstrated simple circuits Can start thinking about building more complex systems Potentially order of magnitude lower E/op than CMOS Next steps: scaling and improved device design, testing larger digital blocks 26