Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Integrated Circuit Design with
NEM Relays
Fred Chen2, Hei Kam1, Dejan Markovic3,
Tsu--Jae King Liu1, Vladimir Stojanovic2, Elad Alon1
Tsu
1Department
of EECS, University of California, Berkeley, CA
2Department of EECS, Massachusetts Institute of Technology, Cambridge, MA
3EE Department, University of California, Los Angeles, CA
CMOS is Scaling, Power Density is Not
Power Density Prediction circa 2000
1000
10.1
100
Sun’s Surface
Rocket Nozzle
Nuclear Reactor
Normalized Energy/op
Power Density (W/cm2)
10000
100
8086 Hot Plate
10 4004
P6
80088085 386
Pentium® proc
286
486
8080
1
1970
1980
1990
2000
2010
Year
S. Borkar
80
60
40
20
0 0
10
1
10
2
3
10
10
1/throughput (ps/op)
4
10
Vdd and Vt not scaling well power/area not scaling
ICCAD 2008 – November 12, 2008
2
CMOS is Scaling, Power Density is Not
Power Density Prediction circa 2000
1000
10.1
100
Sun’s Surface
Rocket Nozzle
Nuclear Reactor
Normalized Energy/op
Power Density (W/cm2)
10000
100
Core 2
8086 Hot Plate
10 4004
P6
80088085 386
Pentium® proc
286
486
8080
1
1970
1980
1990
2000
2010
Year
S. Borkar
80
60
Operate at a
lower energy
point
40
20
0 0
10
Run in parallel to
recoup performance
1
10
2
3
10
10
1/throughput (ps/op)
4
10
Vdd and Vt not scaling well power/area not scaling
Parallelism to improve performance within power budget
ICCAD 2008 – November 12, 2008
3
Where Parallelism Doesn’t Help
25
100
20
80
15
Etotal
10
Edynamic
5
Eleak
0.1
0.2
0.3
Vdd (V)
0.4
0.5
60
Scale Vdd & VT:
40
20
0 1
10
2
10
3
4
10
10
1/throughput (ps/op)
5
10
CMOS circuits have wellwell-defined minimum energy
10.1
Energy/op vs. 1/throughput
Normalized Energy/op
Normalized Energy/cycle
Energy/op vs. Vdd
Caused by leakage and finite sub-threshold swing
Need to balance leakage and active energy
Limits energy-efficiency, regardless how slow the circuit runs
ICCAD 2008 – November 12, 2008
4
What if There Was No Leakage?
Normalized Energy/cycle in CMOS
Measured relay I-V
1.2
W x L = 2mm x 20mm
1.0
20
15
IDS (mA)
Normalized Energy/cycle
25
Etotal
10
0.6
VDS = 1V
0.0
34
0.1
10.1
0.2
0.3
Vdd (V)
0.4
compliance
limit
0.4
0.2
Edynamic
5
H = g = 0.6mm
0.8
0.5
35
36
VGB (V)
37
No longer need to balance increasing component of
energy as Vdd decreases
Relays offer near infinite subsub-threshold slope and no
leakage current
ICCAD 2008 – November 12, 2008
5
Outline
If we could reliably build relays, would circuits
implemented using relays offer superior
energy--efficiency to CMOS?
energy
Compact Circuit Model for NEM Relays
Circuit Design Paradigms for NEM Relays
How to deal with ―slow‖ relays
Compare vs. CMOS
10.1
Fast yet key functionality captured
Digital Logic: 32-bit adder
Mixed Signal I/O: DAC, ADC
ICCAD 2008 – November 12, 2008
6
NEM Relay Structure & Operation
Schematic Cross-Section
Electrical Gate
Metal Channel
Gate Dielectric
W HH
Source
t gap
Closed Relay (“ON”):
|Vgb| > Vpi (pull-in voltage)
Air gapH
Drain
Drain
tdimple
Body Dielectric
Bodyody
Body
Plan-View
L
Source
Open Relay (“OFF”):
|Vgb| < Vpo (pull-out voltage)
Anchor
Electrical
Gate
Body Electrode
(under body
dielectric)
W
Drain
Channel
Contact Dimples
10.1
Relatively low on-state resistance
(100Ω < Ron < 1kΩ @ 90nm)
Infinite off-state resistance Zero
off-state leakage
ICCAD 2008 – November 12, 2008
7
NEM Relay Model
Anchor
Mechanical
Model
Spring k
Mass m
Damper b
C
0.5Rc
Cs
S
Rsd
0.5Rc
Cd
Go
Cgc
Rcs
So
Cgb
D
Rcd
Rsd
Do
Csd_b
Csd_b
B
Compact Verilog
Verilog--A model for circuit simulation:
10.1
Mechanical dynamics: spring (k), damper (b), mass (m)
Electrical parasitics: non-linear gate-body (Cgb), gate-channel
(Cgc), and source/drain-body cap (Csd_b), contact resistance (Rcs,d)
Vpi and delay verified against device measurements
ICCAD 2008 – November 12, 2008
8
NEM Relay Scaling
Constant EE-field scaling has similar benefits as CMOS
Pull-in Voltage with Beam Scaling
Measured pullpull-in voltages scale linearly
If we can overcome reliability issues & surface forces scale:
{W,L,tgap} = {90nm,2.3um,10nm} Vpi = 200mV
10.1
Mechanical delay also scales linearly (~10ns @90nm)
ICCAD 2008 – November 12, 2008
9
NEM Relay as a Logic Element
4-terminal design mimics MOSFET operation
Electrostatic actuation is ambipolar
Non
Non--inverting logic possible: actuation independent of
drain/source voltages
Mechanical delay (~10ns) >> electrical t (~1ps)
Can stack 200 devices (with 1kW on-resistance) before
electrical delay is comparable to mechanical delay
10.1
ICCAD 2008 – November 12, 2008
10
Digital Circuit Design with Relays
CMOS: delay set by electrical time constant
Relays: delay dominated by mechanical movement
10.1
Quadratic delay penalty for stacking devices (pass transistor logic)
Buffer & distribute logical/electrical effort over many stages
Want all relays to switch simultaneously
Implement logic as a single complex gate
ICCAD 2008 – November 12, 2008
11
Digital Circuit Design with Relays
4 gate delays
Delay Comparison vs. CMOS
Single mechanical delay vs. several electrical gate delays
For reasonable loads, relay delay unaffected by fan-out/fan-in
Area Comparison vs. CMOS
10.1
1 mechanical delay
Larger individual devices
Fewer devices needed to implement the same logic function
ICCAD 2008 – November 12, 2008
12
CMOS vs. Relays: Digital Logic
Normalized Energy/op
100
80
60
40
20
0 1
10
2
10
3
4
10
10
1/throughput (ps/op)
5
10
Most processor components exhibit the energy vs.
performance tradeoff of static CMOS
Control, Datapath
Datapath,, Clock
10.1
Adder energy performance tradeoff is representative
ICCAD 2008 – November 12, 2008
13
Relay--based Adder
Relay
Full adder cell:
10.1
12 relays vs. 24
transistors
XOR for free
Use fully complementary
signals to avoid
additional mechanical
delay (to invert)
Relays all sized
minimally
ICCAD 2008 – November 12, 2008
14
N-bit RelayRelay-based Adder
10.1
Ripple carry
configuration
Cascade full adder
cells to create larger
complex gate
Stack of N relays—
relays—
still 1 mechanical
delay
ICCAD 2008 – November 12, 2008
15
Snapshot: NEM Relays vs. CMOS
Energy/op vs. Delay/op across Vdd
9x
10x
32-bit adder
CMOS
Relay
Supply
Voltage
0.5 V
0.32 - 0.9 V
Load Cap
per Output
25 fF
25 fF
Total Gate
Cap
4.0 pF
125 fF
600 mm2
480 mm2
Area
Energy is for adder only
Compare vs. Sklansky CMOS adder1
For similar area:
>9x lower energy/op, >10x greater delay
1D.
10.1
Patil et. al., “Robust Energy-Efficient Adder Topologies,” in Proc. 18th IEEE Symp. on Computer Arithmetic (ARITH'07).
ICCAD 2008 – November 12, 2008
16
Energy--Delay Results
Energy
Relays always provide a more energy efficient solution
> 10x better, even in the GOPS range
Energy/op vs. Delay/op across Vdd & CL
30x capacitance gain
2.4x Vdd gain
10.1
ICCAD 2008 – November 12, 2008
Lower device Cg, Cd
Fewer devices
No leakage energy
17
Area vs. Throughput Tradeoff
30
W Relay (buffered, 25fF)
A
Relay
/A
CMOS
25
20
W Relay (buffered, 100fF)
15
Au Relay (25fF)
10
Au Relay (100fF)
5
Erelay = 20fJ/op
0.5
2.5
3
For high throughputs, relays require area overhead to
achieve similar throughputs as CMOS
Area overhead is bounded (max. 6x6x-25x) :
10.1
1
1.5
2
Throughput (GOPS)
To maintain minimum energy/op for throughputs > 1GOPS,
CMOS needs parallelism too
ICCAD 2008 – November 12, 2008
18
CMOS vs. Relays: Mixed Signal
I/O energy dominant if core is ~30x more efficient
See if I/O circuits can benefit as well
Representative examples: DAC & ADC
How to process/generate analog signals?
No ―linear gain‖ in relays – rely on switching
10.1
ICCAD 2008 – November 12, 2008
19
NEM Relay Based DAC
DAC Architecture
Voltage Output
AC Impedance
EBIT
Need to add passive R to set impedance
Don’t have to buffer up or size relays to drive output
Relays’ gate voltage independent of I/O voltage
Energy dominated by I/O energy:
10.1
2
VIO2 CL
Same topology as CMOS except:
@VIO=200mV, CL=1pF: 62.8fJ/bit vs. ~780fJ/bit for CMOS
ICCAD 2008 – November 12, 2008
20
NEM Relay Based Flash ADC
Topologically similar to CMOS
Key differences…
10.1
Sample/Hold input
Flash Converter – bank of
comparators
Sample & Hold: Unbalanced ―on‖ vs.
―off‖ switching times
Comparators: Body terminal used as
threshold, ambipolar actuation,
hysteretic switching
ICCAD 2008 – November 12, 2008
21
Relay Based Comparators
Body terminal used to set
threshold
VT
Hysteretic switching impact
10.1
Comparator will have different
threshold if previous state varies
Need to reset all comparators to
the same state before each
evaluation dynamic logic
ICCAD 2008 – November 12, 2008
22
Relay Based Comparators
60
50
Output Code
Max
40
30
VT
Min
20
10
0
-0.2
0.8
1
Comparators above & below input will evaluate need a different
decoder
Each comparator has a ―dead―dead-zone‖ where it stays off
10.1
0.2
0.4
0.6
Input Voltage (V)
Ambipolar actuation impact
0
Can use ―Max‖ or ―Min‖ of this range to determine code
ICCAD 2008 – November 12, 2008
23
ADC Results
Vcore = 0.3V
C = 500fF
R = 4kW
VREF = 1V
fsamp = 10 MS/s
6-bit res: 5.5fJ/conv
Energy dominated by resistor string (320fJ out of 350fJ)
10.1
Largely dependent on input dynamic range (VREF), not core voltage
FOM improves with more resolution as resistor string still
dominates
Most advantages stem from lower Ron at smaller Cg’s
ICCAD 2008 – November 12, 2008
24
Conclusions
NEM relays offer unique characteristics
Superior Ion/Ioff ratio vs. CMOS
Superior Ron*Cg vs. CMOS
Switching delay largely independent of electrical t
Relay circuits show order of magnitude energy
efficiency improvements over CMOS
10.1
Use parallelism (and area) to achieve comparable
throughputs in digital circuits
Mixed signal circuits see similar benefits from low Ron at
small Cgs
Key challenge is scaling devices reliably
ICCAD 2008 – November 12, 2008
25
Acknowledgements
10.1
Berkeley Wireless Research Center
NSF
DARPA
FCRP (C2S2)
Intel Foundation Graduate Fellowship
ICCAD 2008 – November 12, 2008
26