Download MTJ Switching Energy Analysis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Click to edit Master title style
Progress Update
Energy-Performance
Characterization of CMOS/MTJ
Hybrid Circuits
Fengbo Ren
05/28/2010
Modern MTJ
 Bias voltage/current controlled variable resistance
device
– Low: RP
– High: RAP
– TMR = (RAP - RP)/ RP
 Spin-transfer-torque (STT) Switching
– Switching is controlled by the direction of writing current.
– Writing current density has to exceed thresholds
2
Motivations for Hybrid Logic
 Significant application in MRAM design.
 Why logic?
– CMOS-compitible
● Switching current: 200uA – 2mA
● 90nm transistor: 1mA/um gate width
– Non-volatility, high stability
● Introducing MTJ's non-volatility into CMOS, which may suppress
leakage in active mode and reduce the leakage in idle mode to
minimum.
– 3D – stack
● Replace CMOS with MTJ may increase density.
3
Questions?
 What architecture can best utilize MTJ's non-volatility
feature to improve energy efficiency?
 Can MTJ/CMOS hybrid circuit has better energy delay
trade-off than CMOS circuit?
 How much leakage power can be saved by introducing
MTJ to CMOS?
 Any overhead? How much is the switching power of
MTJ?
 What will be the trend of MTJ/CMOS hybrid circuit with
technology scaling?
4
Logic-in-Memory MTJ (LIM-MTJ) Logic Style
 LIMT-MTJ
– Use differential MTJ in Dynamic Current-mode Logic
(DyCML)
● Outputs are evaluated based on the resistance difference of pull
down networks through x-coupled PMOS.
● Claimed to have dynamic and static power than SCMOS.
Z
Z’
MTJ
I
X1
X2
X3
External
Inputs
CLK
S’
R(X,Y)’
Y
Y’
Stored
Inputs
(MTJ)
CLK CLK
S Co’
A’
A
I'
R(X,Y)
34 CMOS Transistors + 4 MTJ
VDD
Current Comparator
X1
X2
BL
X3
BL’
WL1
Ci
A
CLK
Co
A
CiA’
Ci’
WL4
WL2 WL3
Ci
Ci’
B MTJ Memory
MTJ
(MTJ Cell)
B’
CLK
GND
MTJ
CLK’
Sum Circuit
CL
B MTJ Memory
MTJ
(MTJ Cell)
CLK
CLK’
B’
CL
Carry Circuit
Schematic of LIM-MTJ 1-bit full adder.
5
Energy-Performance Characterization
 V.S. SCMOS & DyCML
32 CMOS Transistors
VDD
28 CMOS Transistors
VDD
A
B
C
B
A
B
Ci
CLK
S’
Ci
B
A
S
B
A
GND
B
B
Carry Circuit
A
B
Ci
Ci’
Ci
B
Ci
A
A’
A
A
C
CLK CLK
S
Co’
CLK
Co
Ci
A
Ci
B’
Ci’
A’
A
A
B
A’
B’
Co
A
Sum Circuit
Schematic of SCMOS 1-bit full adder.
CLK
GND
CLK
CLK’
Sum Circuit
CL
CLK’
CL
Carry Circuit
Schematic of DyCML 1-bit full adder.
– LIM-MTJ has no energy performance advantage as compared to the
equivelent CMOS implementation
6
MTJ Switching Energy Analysis
 Switching Energy
E W  IW 2  R  t
– IW = JC∙A,
● JC is the critical current density
● A is the junction area. A = π∙W∙L= K∙L2 , L is
junction size.
– R = δ/A
● δ is the resistance-area product, intrinsic MTJ
parameter. δ = 20 Ω ∙ um2
– t is time.
E W  K  J C    L  t.
2
2
7
MTJ Switching Energy Analysis
 JC is a function of current pulse width.
– Switching time is a function of current density.
J C (t ) 
t
J C 0 [1  ln( ) / ], t  8ns
t0
JC 0 
C1
 C2 ,
t
t
J C (t )  J C 0 [1  ln( ) / ]
t0
t  8ns
J C (t )  J C 0 
C
t
● Δ is the thermal stability factor (Δ≥40)
● t0 is the intrinsic switching time. t0 = 1 ns
● JC0 is the intrinsic critical current density, JC0 = JC at t= t0.
– Modern MTJs have been shown to have JC0 = 2-7 MA/cm2
8
MTJ Switching Energy Analysis
 Switching Energy
E W (t )  K  J C 2 (t )    L2  t
– Function of switching time (t) given JC0, δ, L, Δ
– Ref. MTJ
● JC0 = 5 MA/cm2, δ= 20 Ω ∙ um2, L=135nm, (W=65 nm,)
● RP=725 Ω, IC=1.4mA @ t=1ns
 Switching Energy > 1 pJ
– CMOS/MTJ hybrid
logic circuits require
frequent switching is
hardly energy efficient.
9
MTJ Switching Energy Analysis
 Switching Energy with scaling E W (t )  K  J C 2 (t )    L2  t
– δ, L, JC0
 fJ Switching
– δ ≤ 5Ω ∙ um2 & JC0 ≤ 0.6 MA/cm2 & L ≤ 33nm
10
LUT-based Logic
 Store the true table in memory
 Reads out the logic value based on
input selection.
– Reconfigurable
– Can implement all type of logics.
e.g. FPGA
 Replace storage cell with MTJ
– No MTJ switching during the logic
operation. Only need to be configured
once.
– Non-volatile, minimum stanby power.
– Instant boot-up.
Example of 3 input LUT
11
MTJ Reading Circuit
 Conventional current-mirror sense amplifier based reading
circuit. (SA)
∆V
∆V
VIP VIN
– Slow (2 stages)
– Power hungry (DC current)
12
MTJ Reading Circuit
 X-coupled inverter based reading circuit. (XSA)
∆V at
evaluation
phase
1MTJ and 1Rref
accessed per read
Amplified by Xcoupled inverter
– Fast
● ∆V are generated and amplified at the same time
– Power efficient
● no DC current, only charging discharging capacitance
13
Energy Performance Comparison
14
Instant Power
15
1 Bit Full Adder (CMOS_LUT)
 Transistor Count
– 16xEDFF
– 4xMUX4
– 2xMUX2
– 672 Transistors
16
1 Bit Full Adder (MTJ_LUT1)
 Transistor Count
–
–
–
–
16xREAD1XMTJ
4xMUX4
2xMUX2
2xWRTCKT
– 448 Transistors
– 33% Reduction
– 16 MTJ
17
READ1XMTJ
 15T+1MTJ
 Need writing circuit
18
1 Bit Full Adder (MTJ_LUT2)
 Transistor Count
–
–
–
–
–
2x READ8XMTJ
1x 9-WORD DECODER
2x MUX2
1x INV
1x WRTCKT
– 174 Transistors
– 76% Reduction
– 16 MTJ
19
READ8XMTJ




MTJs share reading circuit
1MTJ + 1 Rref are accessed / read
1MTJ is accessed / write
23T + 8 MTJ
20
Simulation Setup
 3 LUT architecture are compared



– CMOS-LUT
– MTJ-LUT1: MTJ reading circuit + MUX
– MTJ-LUT2: Shared MTJ reading circuit + decoder
Configured to implement 1-bit full adder
– 2 3-input LUTs
ASU predictive technology model (PTM)
– 90nm, 65nm (bulk)
– 45nm, 32nm (SOI)
MTJ characteristic
– Rp = 700, Rap = 1400, TMR = 100%, Icap2p = 223uA, Icp2ap
= 500uA
– Verilog-A MTJ model from Richard.
21
Configuration Power
 CMOS-LUT

– 1GHz
MTJ-LUT
– 250MHz
– 750uA Writing Current
– About 3 ns Writing time
/ MTJ
 MTJ-based LUT are 10x bigger configuration power
– 16 MTJ’s switching energy
22
Delay
 MTJ-based LUT2 has 2.5x bigger delay
23
Leakage Power
 MTJ-LUT1 has a little bit bigger leakage power
 MTJ-LUT2 has about 5x smaller total leakage power and
– 10x smaller storage leakage (due to MTJ)
– 2x smaller logic leakage (from MUX to decoder)
24
Energy (Operation Frequency:100MHz)
 LUT2
– 4x total energy saving @ 32nm
● 1/10 leakage_storage, ½ leakage_logic, bigger dynamic_logic
● Dynamic_storage overhead decreases with technology scaling
down.
25
Energy (Operation Frequency:250MHz)
 LUT2
– 3x total energy saving @ 32nm
● 1/10 leakage_storage, ½ leakage_logic, ½ dynamic_logic
● Dynamic_storage overhead decreases with technology scaling
down.
26
Energy (Operation Frequency:500MHz)
 LUT2
– 2x total energy saving @ 32nm
● 1/10 leakage_storage, ½ leakage_logic, ½ dynamic_logic
● Dynamic_storage overhead decreases with technology scaling
down.
27
Standby Power
Standby Power (uW)
Technology Node
Structure
90nm
65nm
45nm
32nm
CMOS-LUT
6.5
12.8
3.3
29.9
MTJ-LUT1
1.66
1.79
0.469
1.04
MTJ-LUT2
0.836
0.625
0.202
0.227
 Dynamic sleep transistor
– 50mV voltage drop across sleep transistor
 5-20X reduction
28
Conclusions
 What architecture can best utilize MTJ's non-volatility feature to improve
energy efficiency?
– LUT-based logic which require no MTJ switching.
 Can MTJ/CMOS hybrid circuit has better energy delay trade-off than CMOS
circuit?
– Yes.
 How much leakage power can be saved by introducing MTJ to CMOS?
– About 10x reduction
 Any overhead? How much is the switching power of MTJ?
– Yes. MTJ reading energy is overhead. MTJ writing energy of modern MTJ is
around several pJ.
 What will be the trend of MTJ/CMOS hybrid circuit with technology scaling?
– Will play significant role in suppressing leakage below 45 nm.
29