Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Click to edit Master title style Progress Update Energy-Performance Characterization of CMOS/MTJ Hybrid Circuits Fengbo Ren 05/28/2010 Modern MTJ Bias voltage/current controlled variable resistance device – Low: RP – High: RAP – TMR = (RAP - RP)/ RP Spin-transfer-torque (STT) Switching – Switching is controlled by the direction of writing current. – Writing current density has to exceed thresholds 2 Motivations for Hybrid Logic Significant application in MRAM design. Why logic? – CMOS-compitible ● Switching current: 200uA – 2mA ● 90nm transistor: 1mA/um gate width – Non-volatility, high stability ● Introducing MTJ's non-volatility into CMOS, which may suppress leakage in active mode and reduce the leakage in idle mode to minimum. – 3D – stack ● Replace CMOS with MTJ may increase density. 3 Questions? What architecture can best utilize MTJ's non-volatility feature to improve energy efficiency? Can MTJ/CMOS hybrid circuit has better energy delay trade-off than CMOS circuit? How much leakage power can be saved by introducing MTJ to CMOS? Any overhead? How much is the switching power of MTJ? What will be the trend of MTJ/CMOS hybrid circuit with technology scaling? 4 Logic-in-Memory MTJ (LIM-MTJ) Logic Style LIMT-MTJ – Use differential MTJ in Dynamic Current-mode Logic (DyCML) ● Outputs are evaluated based on the resistance difference of pull down networks through x-coupled PMOS. ● Claimed to have dynamic and static power than SCMOS. Z Z’ MTJ I X1 X2 X3 External Inputs CLK S’ R(X,Y)’ Y Y’ Stored Inputs (MTJ) CLK CLK S Co’ A’ A I' R(X,Y) 34 CMOS Transistors + 4 MTJ VDD Current Comparator X1 X2 BL X3 BL’ WL1 Ci A CLK Co A CiA’ Ci’ WL4 WL2 WL3 Ci Ci’ B MTJ Memory MTJ (MTJ Cell) B’ CLK GND MTJ CLK’ Sum Circuit CL B MTJ Memory MTJ (MTJ Cell) CLK CLK’ B’ CL Carry Circuit Schematic of LIM-MTJ 1-bit full adder. 5 Energy-Performance Characterization V.S. SCMOS & DyCML 32 CMOS Transistors VDD 28 CMOS Transistors VDD A B C B A B Ci CLK S’ Ci B A S B A GND B B Carry Circuit A B Ci Ci’ Ci B Ci A A’ A A C CLK CLK S Co’ CLK Co Ci A Ci B’ Ci’ A’ A A B A’ B’ Co A Sum Circuit Schematic of SCMOS 1-bit full adder. CLK GND CLK CLK’ Sum Circuit CL CLK’ CL Carry Circuit Schematic of DyCML 1-bit full adder. – LIM-MTJ has no energy performance advantage as compared to the equivelent CMOS implementation 6 MTJ Switching Energy Analysis Switching Energy E W IW 2 R t – IW = JC∙A, ● JC is the critical current density ● A is the junction area. A = π∙W∙L= K∙L2 , L is junction size. – R = δ/A ● δ is the resistance-area product, intrinsic MTJ parameter. δ = 20 Ω ∙ um2 – t is time. E W K J C L t. 2 2 7 MTJ Switching Energy Analysis JC is a function of current pulse width. – Switching time is a function of current density. J C (t ) t J C 0 [1 ln( ) / ], t 8ns t0 JC 0 C1 C2 , t t J C (t ) J C 0 [1 ln( ) / ] t0 t 8ns J C (t ) J C 0 C t ● Δ is the thermal stability factor (Δ≥40) ● t0 is the intrinsic switching time. t0 = 1 ns ● JC0 is the intrinsic critical current density, JC0 = JC at t= t0. – Modern MTJs have been shown to have JC0 = 2-7 MA/cm2 8 MTJ Switching Energy Analysis Switching Energy E W (t ) K J C 2 (t ) L2 t – Function of switching time (t) given JC0, δ, L, Δ – Ref. MTJ ● JC0 = 5 MA/cm2, δ= 20 Ω ∙ um2, L=135nm, (W=65 nm,) ● RP=725 Ω, IC=1.4mA @ t=1ns Switching Energy > 1 pJ – CMOS/MTJ hybrid logic circuits require frequent switching is hardly energy efficient. 9 MTJ Switching Energy Analysis Switching Energy with scaling E W (t ) K J C 2 (t ) L2 t – δ, L, JC0 fJ Switching – δ ≤ 5Ω ∙ um2 & JC0 ≤ 0.6 MA/cm2 & L ≤ 33nm 10 LUT-based Logic Store the true table in memory Reads out the logic value based on input selection. – Reconfigurable – Can implement all type of logics. e.g. FPGA Replace storage cell with MTJ – No MTJ switching during the logic operation. Only need to be configured once. – Non-volatile, minimum stanby power. – Instant boot-up. Example of 3 input LUT 11 MTJ Reading Circuit Conventional current-mirror sense amplifier based reading circuit. (SA) ∆V ∆V VIP VIN – Slow (2 stages) – Power hungry (DC current) 12 MTJ Reading Circuit X-coupled inverter based reading circuit. (XSA) ∆V at evaluation phase 1MTJ and 1Rref accessed per read Amplified by Xcoupled inverter – Fast ● ∆V are generated and amplified at the same time – Power efficient ● no DC current, only charging discharging capacitance 13 Energy Performance Comparison 14 Instant Power 15 1 Bit Full Adder (CMOS_LUT) Transistor Count – 16xEDFF – 4xMUX4 – 2xMUX2 – 672 Transistors 16 1 Bit Full Adder (MTJ_LUT1) Transistor Count – – – – 16xREAD1XMTJ 4xMUX4 2xMUX2 2xWRTCKT – 448 Transistors – 33% Reduction – 16 MTJ 17 READ1XMTJ 15T+1MTJ Need writing circuit 18 1 Bit Full Adder (MTJ_LUT2) Transistor Count – – – – – 2x READ8XMTJ 1x 9-WORD DECODER 2x MUX2 1x INV 1x WRTCKT – 174 Transistors – 76% Reduction – 16 MTJ 19 READ8XMTJ MTJs share reading circuit 1MTJ + 1 Rref are accessed / read 1MTJ is accessed / write 23T + 8 MTJ 20 Simulation Setup 3 LUT architecture are compared – CMOS-LUT – MTJ-LUT1: MTJ reading circuit + MUX – MTJ-LUT2: Shared MTJ reading circuit + decoder Configured to implement 1-bit full adder – 2 3-input LUTs ASU predictive technology model (PTM) – 90nm, 65nm (bulk) – 45nm, 32nm (SOI) MTJ characteristic – Rp = 700, Rap = 1400, TMR = 100%, Icap2p = 223uA, Icp2ap = 500uA – Verilog-A MTJ model from Richard. 21 Configuration Power CMOS-LUT – 1GHz MTJ-LUT – 250MHz – 750uA Writing Current – About 3 ns Writing time / MTJ MTJ-based LUT are 10x bigger configuration power – 16 MTJ’s switching energy 22 Delay MTJ-based LUT2 has 2.5x bigger delay 23 Leakage Power MTJ-LUT1 has a little bit bigger leakage power MTJ-LUT2 has about 5x smaller total leakage power and – 10x smaller storage leakage (due to MTJ) – 2x smaller logic leakage (from MUX to decoder) 24 Energy (Operation Frequency:100MHz) LUT2 – 4x total energy saving @ 32nm ● 1/10 leakage_storage, ½ leakage_logic, bigger dynamic_logic ● Dynamic_storage overhead decreases with technology scaling down. 25 Energy (Operation Frequency:250MHz) LUT2 – 3x total energy saving @ 32nm ● 1/10 leakage_storage, ½ leakage_logic, ½ dynamic_logic ● Dynamic_storage overhead decreases with technology scaling down. 26 Energy (Operation Frequency:500MHz) LUT2 – 2x total energy saving @ 32nm ● 1/10 leakage_storage, ½ leakage_logic, ½ dynamic_logic ● Dynamic_storage overhead decreases with technology scaling down. 27 Standby Power Standby Power (uW) Technology Node Structure 90nm 65nm 45nm 32nm CMOS-LUT 6.5 12.8 3.3 29.9 MTJ-LUT1 1.66 1.79 0.469 1.04 MTJ-LUT2 0.836 0.625 0.202 0.227 Dynamic sleep transistor – 50mV voltage drop across sleep transistor 5-20X reduction 28 Conclusions What architecture can best utilize MTJ's non-volatility feature to improve energy efficiency? – LUT-based logic which require no MTJ switching. Can MTJ/CMOS hybrid circuit has better energy delay trade-off than CMOS circuit? – Yes. How much leakage power can be saved by introducing MTJ to CMOS? – About 10x reduction Any overhead? How much is the switching power of MTJ? – Yes. MTJ reading energy is overhead. MTJ writing energy of modern MTJ is around several pJ. What will be the trend of MTJ/CMOS hybrid circuit with technology scaling? – Will play significant role in suppressing leakage below 45 nm. 29