Download System Level Advantages of Steep Slope Devices

Document related concepts
no text concepts found
Transcript
System Level Advantages of Steep
Slope Devices
David J. Frank
8/31/2012
SiNANO Summer School
Bertinoro, Italy
1
Overview
• Steep slope devices are being pursued to
reduce the energy of computation.
– Lower voltage operation
– BUT, possibly low drive current
• To understand impact, we need to
consider what the system processor looks
like from a device perspective.
• We need to optimize the devices and
circuits for both efficiency and
performance.
2
Outline
1. Why TFETs? Review conventional
limitations.
2. Characterizing a processor with simple
models
3. Optimizing devices for processor-level
performance
a) General trends for MOSFETs
b) Projections for TFETs
4. Conclusion
3
1. Limitations to Scaling
1. Electrostatic constraints
2. Quantum mechanical leakage currents
3. Discreteness of matter and energy
4. Thermodynamic limitations
5. Practical and environmental constraints on power
Basic idea of Scaling:
Adjust dimensions,
voltages, & doping to
achieve smaller FET
with same electrostatic
behavior.
4
Electrostatic constraints on FET design
1. A good design should have
A.
B.
C.
D.
E.
F.
High output resistance
High gain
Low sensitivity to variations
High transconductance
High drain current
High speed
2. Must choose a compromise:
Short, but not so short that 2D
effects kill A, B, C.
3. STEEP FETs may have
different stuctures, but
will still have aspect ratio
requirements.
}
}
Long channel
behavior
Short channel
behavior
Gate
S
D
5
Quantum Mechanical Tunneling Leakage Currents
Leakage currents increase exponentially as the barriers become thinner.
TFETs use
the tunneling
to achieve
steep slope.
FET 'ON'
FET 'OFF'
Vdd
Gnd
Gnd
Gnd
Gnd
Vdd
Gate insulator tunneling
Subthreshold leakage
Direct source-to-drain tunneling
Drain-to-body tunneling
ee-
Source
e-
Channel
Drain
6
Atomistic effects
The number of dopant atoms
in the depletion layer of a
MOSFET has been scaling
roughly as Leff1.5.
Statistical variation in the
number of dopants, N, varies
as N1/2, causing increasing VT
uncertainty for small N.
249,403,263 Si atoms
68,743 donors
13,042 acceptors
[D. J. Frank, et al.,
1999 Symp. VLSI Tech.]
STEEP FETs may also be sensitive to atomistic effects
7
Thermodynamic limitations
ƒFor conventional FETs, the Boltzmann distribution determines
the subthreshold slope and leakage current, VT, and diode
leakage currents, too.
e(VG -VT )/ηkT
IsubVT = I0 e
ƒVT can only be scaled by reducing the temperature, which is
not acceptable for many applications.
ƒSTEEP slope FETs seek ways
around this limitation:
•Band-to-band tunneling (TFETs)
•NEMS electrostatic switches
•Ferroelectric FETs
e-
Source
Channel
Drain
ƒIrreversible computation => All
switching energy is converted to heat.
8
Practical and Environmental Issues
• Power consumption and heat removal are limited
by practical considerations.
• Low power applications must be battery powered
– Many must be lightweight => power < ~few watts.
– Disposable batteries can cost >> $500/watt over life of
device.
– Rechargeables can cost > $50/watt over life of device.
• Home electronics is limited to <~1000W by
heating of the room and cost of electricity.
• High performance is limited by difficulty of heat
removal from chip (~100 W/chip). (Cost of
electricity is ~$5/watt over life.)
9
2. Modeling a processor
Fixed
parameters
Requires a large set
of simplified models.
Variables:
initial guess
new
values:
improved
guess
Area Model
Thermal
Model
Wiring
statistics
Wire Capacitance
Delay
tolerance adjustments
Adjust for Latency
of Long Paths
Embed in
optimization loop to
find best design.
Device Structure
IV Model
Leakage Model
Constrained
optimizer
Leakage Power
tolerance adjustments
Total Power
10
Models and Approximations
Simplifying Assumptions
zProcessor chip is assumed to have a fixed number of cores, each with
a specified number of logic gates.
zOnly the logic within the cores is considered in the model.
zThe clock and memory aspects of the chip are assumed to scale in
the same way as the logic (delay, power, and area).
Clock
zCore-to-core and core-to-memory communication are not dealt with.
Fudge
Repeaters
Logic
Treat in
detail
Memory
Fudge
Treat these by simple scaling from the logic part.
11
How much area do the processor cores take?
100% to 25%, generally decreasing with generation:
70%
Prescott. 125M FETs
100%
Alpha 21264 ('96)
15M FETs, L1 cache only
40%
Dothan, 140M FETs
40%
Power4, 174M
FETs
~25% 2 cores, 1.72B FETs
12
Area usage within a processor core
Approximate area fractions for a high-performance
microprocessor core in leading-edge technology
9.3%
23.3%
7.0%
data from:
M. Scheuermann
and M. Wisniewski
9.3%
23.3%
7.0%
9.3%
60%
60.5%
23.3%
1/3
7.0%
9.3%
20.2%
23.3%
2/3
13.3%
40.3%
Buffers & extra latches
7.0%
20.2%
33%
31% 36%
20.2%
13.3%
Caches (L1)
Buffers & extra latches
Macros
Caches
Caps, Clock dist., Unused
Register files
Custom & RLMs
Caps, Clock dist., Unused
12.5%
14.5%
1.2%
Buffers & extra
latches
Caches
Register files
Processors built with nanotechnology are likely to
have similar area usage statistics.
Nanotechnology may require additional area
allocations for defective circuitry.
Estimates of power and computational densities
should take into account realistic area efficiencies.
Latches and
LCBs
Logic
Unused/caps
Caps, Clock
dist., Unused
90%
11.2%
14.5%
Buffers & extra latches
Caches
Register files
Latches and LCBs
Logic in use
Unused Logic
Unused/caps
Caps, Clock dist., Unused
13
Chip Power and Area
cores
core logic
Power:
core “other”
caches, controllers, I/O
100%
Area:
logic
0%
Clk,
reg,
L1
Caps/
unused
caches (SRAM/DRAM) Controllers, I/O, etc.
cores
Simple model scales the entire
chip from this core logic.
Can improve model by creating a
separate model for the caches.
Note that a processor chip has
much more memory than logic:
►Steep slope devices must also
provide for memory functions.
14
Circuit Delay Estimation
Approximate all of the logic gates as NAND2 gates with average width,
average FO=1.65, and average length wire capacitance.
The clock period is given by n × the average gate delay, where n
depends on the architecture.
Delay estimation:
τ1 =
VDD (C parasitic + Cwire + C gateload )
*
2 I eff
Current is adjusted to account
for noise and variations.
τ 2 = Rwire (C wire + C gateload )
τ3 = Lwire (c / 2)
+ (τ
)
4 / 3 3/ 4
τ3
Propagation
delay
τ1
+
τ=
0.5 + (1 − VT / VDD ) (1 + α)
4/3
2
Final delay corrections
based on Eble's thesis.
Correction for
VT/VDD.
15
Power Calculation
PTOT = PDYN + PsubVT + POX + PB 2 B
PDYN =
α
lD
N CKT
1
2
C (VH − VL )VDD / τ
PsubVT = N CKT VDD w J off (VL − VT , VDD , tox , LG ,η )
Pox = 1.2 N CKT VDD w LG J ox (VT , VDD , tox ,η )
PB 2 B = 0.75 N CKT VDD w hJ J B 2 B (FMax , VDD )
LTR = lαD NCKT
1
τ
(Logic Transition Rate)
τ = mean delay for a single loaded logic gate
α
lD
is activity factor divided by logic depth. Usually ~0.0052 in
recent optimizations.
16
Communication and Wiring Models
(
4FO ⎛ 2r −3
− 2 NCKT
⎜l
3 + FO ⎝
lR
LnoRptr
∫
=
∫
1
lR
1
N Rptr =
∫
l Max
lR
linet (l)dl
inet (l)dl
)
2 r −3
⎞⎟
⎠
linet (l)dl l R
Units are gate pitches.
r = Rent exponent, 0.6, here.
# Wiring Levels
Required
inet (l) =
log(number of wires)
Assume wire lengths distributed according to Rent's rule.
lR
2 NCKT log(length)
From optimizations:
12
10
8
6
4
1E+5 1E+6 1E+7 1E+8
Number of Logic Gate
17
Repeater Model
100
0.9
0.8
0.7
80
0.6
70
60
0.5
0.4
Pecon=10 W/cm2
0.3
50
0.01
0.2
0.1
Latency Penalty Factor
1
Repeater Spacing (cm)
90
1
1
Repeater spacing
Repeater width
Repeater Width (um)
Repeater Spacing (um)
Long wires should receive repeaters with a spacing that
is optimized.
Long wire delay can be absorbed into pipeline depth, but
the latency causes inefficiency, so one can use a latency
penalty factor: γ.
0.1
0.01
9S
10S
0.001
0.01
11S
0.1
12S
1
13S
10
100
CPU Core Power (W)
18
Wire Model
Wires are usually taller than they are wide, to reduce resistance.
For 2:1 height to width, wiring with equal lines and spaces has Cwire =
0.062 kBEOL fF/um.
Wire resistance increases rapidly at small dimensions due to surface scattering.
This fit is based on data from T.S. Kuan (MRS Symp Proc, Vol. 612, 2000):
(
⎛ 70nm ln 1 + e (T − 40) / 10
ρ (W , T ) = ρ300 K ⋅ ⎜⎜
+
26
⎝ W
)⎞⎟
⎟
⎠
This model:
4.0
Ta/Plated-Cu/Ta Annealed at 400°C
68 nm
115 nm
193 nm
503 nm
1.11 μm
50 nm
100 nm
200 nm
500 nm
1000 nm
Resistivity ρ (μΩ-cm)
3.0
2.0
1.0
0.0
0
50
100
150
200
250
300
T (K)
T.S. Kuan (SRC Report)
measurements of thin copper films.
19
Variability effects
• Most-probable worst-case-vector approach can
handle both local and global variability.
• Variation sources:
– Signal Coupling noise
– Supply noise
– Processing variations
• Consequences that need to be modeled:
–
–
–
–
Increased static power
Critical path delay distribution
Single stage functionality
Incomplete switching to supply rails.
20
Estimating worst-case conditions
Example targeting 50% parametric yield.
Chip Power (W)
Calculated
worst case.
Optimize THIS.
230 W
Nominal design point
Clock Frequency (GHz)
21
Estimating worst-case conditions
Example targeting 50% parametric yield.
Chip Power (W)
Shift to VDD,LOW (negative guardband)
230 W
Nominal design point
VDD-10%
Clock Frequency (GHz)
22
Estimating worst-case conditions
Example targeting 50% parametric yield.
Chip Power (W)
Add local and global variability
230 W
Nominal design point
VDD-10%
Clock Frequency (GHz)
23
Estimating worst-case conditions
Example targeting 50% parametric yield.
25% of yield is lost for delay.
Chip Power (W)
Shift to 25 percentile
frequency bound
230 W
Nominal design point
VDD-10%
Clock Frequency (GHz)
24
Estimating worst-case conditions
Chip Power (W)
Example targeting 50% parametric yield.
25% of yield is lost for delay.
At fmin, shift to nominal power at
VDDhigh (positive guardband)
230 W
VDD+5%
Nominal design point
VDD-10%
Clock Frequency (GHz)
25
Estimating worst-case conditions
Chip Power (W)
Example targeting 50% parametric yield.
25% of yield is lost for delay.
Add local and global variability
at this fixed frequency
230 W
VDD+5%
Nominal design point
VDD-10%
Clock Frequency (GHz)
26
Estimating worst-case conditions
Example targeting 50% parametric yield.
25% of yield is lost for delay.
25% of yield is lost for power.
Chip Power (W)
Shift to 75 percentile power
230 W
VDD-10%
Nominal design point
Clock Frequency (GHz)
27
Impact of variability
Energy/transition (fJ)
10
1
full tolerances
noise, no var
no noise, no var
var, no noise
Variability and
noise tolerances
increase energy
× delay product
by ~3X.
0.1
1.E+08
1.E+09
1.E+10
Frequency
Optimized energy-frequency curves for 22nm PDSOI
28
Summary: Models, Assumptions and
Approximations (I)
ƒ
ƒ
ƒ
Power modeling
ƒ Dynamic switching energy plus static power mechanisms
including sub-threshold current, gate oxide tunneling, and
body-to-drain band-to-band tunneling.
Device modeling
ƒ Flexible physics-based compact model that takes 2D
effects into account and allows LG and VT to be optimized.
ƒ Gate length should be fully optimized, not set by the
technology node.
Circuit modeling
ƒ Delay is for average loaded NAND2 gates, based on
model from J.C. Eble's thesis [Ga.Tech. '98].
ƒ Capacitance includes gate, parasitic, and wire parts
(Rent's rule).
ƒ Wire resistance includes temperature dependence and
surface scattering in small wires.
29
Summary: Models, Assumptions and
Approximations (II)
ƒ Chip-level modeling
ƒ Allocate fixed fraction of chip power and area to logic, and
assume fixed number of logic gates. Optimize the logic
part and assume the rest scales similarly.
ƒ Assume multiple processor cores are interconnected in a
way that does not greatly add to the wiring burden.
ƒ Long wires are fatter, and receive repeaters with a spacing
that is optimized.
ƒ Long wire delay is accounted for using a latency penalty
factor.
ƒ On-chip tolerance/variability and noise is accounted for.
30
3. Optimization Results and Projections
• General results for MOSFETs
• Projections for steep slope devices
– Other novel devices, too
31
What defines a technology
generation?
•
Process improvements
– Lithographic tolerance improvements enable smaller dimensions
• Specify dimensional tolerances for each generation, instead of the dimensions
themselves.
• Except that it may be necessary to specify the wire pitch
– Better doping and annealing techniques enable more abrupt doping profiles
• Specify the characteristic length scales of the doping
•
New materials
– Gate dielectric materials and properties must be specified for each generation
• Allow optimization to choose thickness
– Wiring dielectric constant must be specified for each generation
•
New structures
– Models for different FET structures may be necessary
– New parasitic resistance and capacitance models may be necessary
– Different heat sink concepts may arise for different generations
• Usually just assume a fixed junction temperature
32
Optimizing Technology within Constraints
¾Practicality imposes power
constraints.
¾Electrostatics imposes
geometric constraints
¾Thermodynamics imposes
voltage constraints.
¾Quantum mechanics imposes
miniaturization constraints due to
tunneling.
Fixed architectural complexity
+ Fixed power constraints
+ Device physics
= Existence of an optimal technology with maximal performance.
leakage increases due
to tunneling effects
dynamic power
Large
Miniaturization
Small
log(Performance)
Power
leakage power
Declining available
dynamic power
overwhelms speed
improvements of
scaling
Large
Miniaturization
Small
33
General optimization results –
performance vs power
1.E+10
1.E+09
Frequency (Hz)
As always, the
easiest way to
increase
performance is
to increase the
power.
22nm
16nm
11nm
1.E+08
1.E+07
0.01
0.1
1
10
100
1000
Total Chip Power(W)
Conditions: PDSOI, 4 core processor chip, constraining total chip power
Optimizing: VDD, tox, dopings (for VTs), LG, p:n width ratio, mean widths,
repeater size and spacing.
34
Gate Length and Chip Area vs Power
50
Lower power requires longer gate
lengths, to reduce variability.
0.8
45
0.7
40
Chip Area (cm2)
Gate Length (nm)
0.6
22nm
35
30
25
11nm
20
22nm
0.5
22nm
22nm
0.4
16nm
16nm
11nm
11nm
0.3
11nm
15
0.2
10
5
0
0.01
0.1
Toxeqv: 0.6 – 0.82 nm
0.1
1
10
Total Chip Power(W)
100
1000
0
0.01
Device density increases for
each generation. Shrinking area
increases power density, too.
0.1
1
10
100
1000
Total Chip Power(W)
Conditions: PDSOI, 4 core processor chip, constraining total chip power
Optimizing: VDD, tox, dopings (for VTs), LG, p:n width ratio, mean widths,
repeater size and spacing.
35
Voltage and Energy vs Power
Optimal supply voltage can become quite low for low power constraints, leading to very
low energy use per logic transition.
16
1.4
14
Energy per logic transition (fJ)
1.2
Supply Voltage (V)
1
0.8
11nm
22nm
0.6
0.4
0.2
0
0.01
12
10
22nm
8 16nm
22nm
11nm
11nm
16nm
6
22nm
4
11nm
2
0.1
1
10
Total Chip Power(W)
100
1000
0
0.01
0.1
1
10
100
1000
Total Chip Power(W)
Conditions: PDSOI, 4 core processor chip, constraining total chip power
Optimizing: VDD, tox, dopings (for VTs), LG, p:n width ratio, mean widths,
repeater size and spacing.
36
Energy vs performance trade-off
Energy per logic transition (fJ)
10
10x
22nm
16nm
1
11nm
22nm
4x
11nm
0.1
1.E+07
1.E+08
1.E+09
1.E+10
Clock Frequency (Hz)
Conditions: PDSOI, 4 core processor chip, constraining total chip power
Optimizing: VDD, tox, dopings (for VTs), LG, p:n width ratio, mean widths,
repeater size and spacing.
37
Optimal On/Off Ratio
PDSOI nFETs, currents measured at nominal process and bias conditions
bi
as
co
nd
itio
ns
0.1
Hi
gh
1000:1
22 nm
16 nm
0.001
11 nm
co
nd
itio
ns
Off-current (A/cm)
0.01
Lo
w
bi
as
0.0001
10000:1
0.00001
0.1
1
10
100
On-current (A/cm)
Conditions: PDSOI, 4 core processor chip, constraining total chip power
Optimizing: VDD, tox, dopings (for VTs), LG, p:n width ratio, mean widths,
repeater size and spacing.
38
Power allocation – Logic versus communication
ƒ Each point is normalized by its own total power: dynamic+static,
logic+repeater
ƒ When the repeater static power is included with the power for switching
the wires (logic & repeaters), the fraction is roughly constant.
39
Communication vs Logic – alternate devices
Broken gap
CNT
Staggered
gap
TFET
ƒ Majority of power is attributed to communication even for
exotic devices
ƒ Opportunity for power reduction – low voltage signal transfer!
40
Novel devices for 11nm and beyond
•
III-V FinFETs
– Higher mobility improves drive current
•
Tunnel FETs
– Improved subthreshold slope enables low VDD and low energy
operation.
– To properly model this device, have to be able to calculate the tunneling
barrier shapes and band-edge alignments.
– A compact model for TFETs has been developed [P Solomon, 2011
DRC], but for more general insight we can alter the Boltzmann constant
in the conventional FET model, to see the impact of steeper
subthreshold slope.
•
Carbon Nanotube Transistors
– Ballistic current flow in the channel enables very high switching speeds
for these devices.
– A compact model for CNTs was presented at IEDM 2009 [L. Wei].
41
FET Model
General purpose empirical temperature-dependent FET model
using a Fermi-Dirac integral to provide an alpha power model
above VT and exponential dependence below VT
Can vary k and μ to roughly evaluate alternative devices.
Wε ηkT ⎛ ηkT / e
⎜
I Dsat (VGS ) = eff I
tox e ⎜⎝ FI EC LCH
γ
s
⎞ ⎛ μ ( E⊥ ) ⎞
⎛ V − VT
⎟⎟ μ0 ⎜⎜
⎟⎟ EC Fα ⎜⎜ GS
⎝ ηkT / e
⎠ ⎝ μ0 ⎠
⎞
⎟⎟
⎠
10W FET
Lg=28nm
1mW FET
Lg=45nm
42
III-V FinFETs
III-V FinFETs are modeled by increasing the mobility in the conventional model.
Increased mobility enables an improved energy/performance tradeoff by
reducing the voltage needed for high performance designs.
FinFET
III-V 2x
III-V 4x
10
Fin
2x
4x
Drive current multiplier:
1.2
1x
(=Si)
1
2x
(~GaAs)
4x
(~InGaAs)
0.1
1.E+13
1.E+14
1.E+15
Supply Voltage (V)
Energy/transition (fJ)
1.4
1
0.8
0.6
0.4
0.2
0
0
2
4
6
8
10
12
14
Frequency (GHz)
1.E+16
Performance (transitions/sec)
43
16
The Band-to-Band Tunnel-FET
Operating principle of a tunnel field-effect transistor:
ƒ Bands are crossed in the ‘on’ state and uncrossed by
the gate voltage in the ‘off’ state.
ƒ In ‘off’ state energy tails are cut-off by valence band in
Source and conduction band in Drain.
Comment: in non-heterojunction case, it is difficult to achieve high
enough fields for strong tunneling, so the on-current tends to be reduced.
44
TFET Heterostuctures
Planar SiGe H-TFET
gate
III-V H-TFET
ON-state
poly
source
n+ Si
p-Si
Off-state
Buried Oxide
ƒOptimum electrostatics with gate
all around
ƒNew material combinations
Si
ƒIntegration onto silicon possible
Vg > 0
SiGe
High on
current
III-V materials offer lower effective
masses and more heterojunctions,
to further improve tunneling.
Nanowire Geometry Offers:
drain
p++ SiGe
SiGe Heterostuctures offer low
effective bandgap, which improves
tunneling.
Ambipolar
device
supressed
ƒScaling to quantum capacitance
limit
ƒImproved on-current
Si
Vg = 0
Ambipolar
device
supressed
SiGe
Si
x
[Koester, Riel, Koswatta]
45
Optimizing steep slope FETS
Performance vs slope and drive current, at constant power density
(high performance regime)
Mobility multiplier:
Performance (arb units)
3000
1x
0.6x
0.4x
0.25x
0.15x
0.04x
0.01x
0.003x
0.001x
0.0003x
2500
Highest drive current
2000
1500
1000
500
Lowest drive current
0
0
20
40
60
80
100
Sub-VT swing (mV/dec)
Using the generic FET model.
Node: 11nm,
Power constraint: 25 W/cm2
46
Optimizing steep slope FETs
Energy vs performance, Low power regime
Energy / transition (fJ)
10
Current
multiplier:
1
0.03x
60
mV/dec
1.0x
0.5x
0.25x
0.12x
0.06x
0.03x
FinFET
10
mV/dec
60
mV/dec
0.1
1.E+08
1.0x
1.E+09
TFET
10
mV/dec
1.E+10
Frequency (Hz)
Sub-VT slope is swept in each TFET curve. Generic FET model, 11nm node.
Current drive must be high to maintain advantage over conventional devices.
14nm FinFET curve is shown for comparison (power is swept).
47
Impact of reduced current drive in TFETs
100
Energy per logic transition (fJ)
Slope = 31 mV/decade
10
I=1/10 I0
I=1/3 I0
I=I0
1x
1
1/3x
1/10x
0.1
0.01
1.E+07
1.E+08
1.E+09
1.E+10
1.E+11
Frequency (Hz)
48
Optimized voltage for steep slope devices
•
•
Adjusting only the sub-VT slope. Current drive remains at 1x.
Variability and tolerances limit voltage scaling. Even extrapolated to
zero slope, the supply voltage may remain fairly high.
7GHz
2.5GHz
7GHz +10mV
7GHz
2.5Ghz +10mV
7GHz +10mV
2.5Ghz +10mV
1.8
0.7
1.6
Energy / transition (fJ)
0.6
Supply Voltage (V)
2.5GHz
0.5
0.4
0.3
0.2
0.1
1.4
1.2
1
0.8
0.6
0.4
0.2
0
0
0
20
40
60
80
100
0
SubVT Swing (mV/dec)
20
40
60
80
100
SubVT Swing (mV/dec)
These optimizations are for fixed performance, minimizing power, 11nm node. Optimized widths, VT’s, VDD, repeaters.
The +10mV cases have an extra 10mV (one sigma) local variability in the VT.
49
Comparing devices – Energy vs Performance
Energy per logic transition (fJ)
10
PDSOI 22nm
1
PDSOI 16nm
PDSOI
PDSOI 11nm
FinFET Si
FinFET GaAs
FinFET
FinFET InGaAs
TET 1x
TFET 1/3x
0.1
TFET 1/10x
TFET
0.01
1.E+07
1.E+08
1.E+09
1.E+10
1.E+11
Frequency (Hz)
50
TFET compact model
[ P. Solomon, et al., 2011 DRC ]
Assumes
a cylindrical nanowire heterojunction TFET with wrap-around gate.
Positive source Fermi level
Drain characteristics, with degenerate source
-EFS
VDS < 0
VDS > 0
51
Screening Length Model:
e − x /λ s
Yu et al. IEEE TED 2008
Model assumes εi / εs =1.
For εi / ε s = 1:
λ=
π ( R + ti )
J 0 (0) −1
and
λ s =λ / π
=
( R + ti )
2.405
52
WKB Integral Function
WKB Integral:
⎡ 2 xc
⎤
T ( E ) = exp ⎢ − ∫ dx 2me (Vc1 − E ) ⎥
⎣ h0
⎦
E
x
ln T ( E ) = −
2
λ
h s
Convert integral over
distance to integral over
energy.
Vc ( x ) = E
2 meV c1
1− E / Vc1 (dx / λ s )
∫
0
= − h2 λ s 2 meV c1 G( u , Lr )
where
u=(V −Vc1 +ϕ+Ve ) /Vc1, Lr =L/λs , and
scaled channel length Lr as a parameter, where
G(u , L) = ∫
1+u
− ln[e− Lr + (1− e− Lr ) w2 − u ]dw
=∫
1+u
− ln[e− Lr + (1− e− Lr ) w2 − u ]dw
0
u
Scaled WKB Integral function G(u,Lr), with
(
)
(u < 0)
(u ≤
> 0)
through-barrier tunneling occurs for negative,
and under barrier tunneling for positive u.
53
Material parameters for InGaAs / GaSb
GaSb
InAs
1.6
0.22
CB Edge Shift (eV)
1.2
EC
0.18
m/m0
0.16
0.14
1.0
0.12
0.8
0.10
0.6
0.08
0.06
0.4
Effective mass (m0)
0.20
1.4
radius
0.04
0.2
0.02
0.0
0.00
1
10
Radius (nm)
Steven Laux program
54
Optimizing InAs/GaSb TFETs
TFET
FinFET
Energy per transition (fJ)
10
•16 cores of 1.5x106 circuits each
are assumed.
•Total chip power is constrained.
•Vdd, L, Vt , ve, widths, and φ were
varied to maximize the clock
frequency.
•Nanowire diameter determines
the InAs/GaSb bandgaps and band
alignments.
1
0.1
0.01
1.E+08
1.E+09
1.E+10
Frequency (Hz)
Optimized design shows 2-3x lower energy for TFETs.
55
Optimized InAs/GaSb Device Parameters
TFET
FinFET
•Optimized TFET slope and supply are lower
than FinFET, but degrade for higher
performance.
•Fin pitch can be tighter than TFET nanowire
pitch, indicating lower integration density for
TFETs.
Supply Voltage (V)
1
FinFET
TFET
TFET diam
0.1
1.E+08
1.E+09
Dimension (nm)
Subthreshold Slope (mV/dec)
Frequency
(Hz)
TFET
FinFET
FinFET
Fin height
TFET diameter
10
Fin width
TFET
1
1.E+08
10
1.E+08
FinFET h
100
1.E+10
100
FinFET tsi
1.E+09
1.E+10
Frequency (Hz)
1.E+09
Frequency (Hz)
1.E+10
56
Carbon Nanotube FET (CNFET)
Geometry assumption:
GAA (gate all around) in dense planar array of carbon nanotubes.
Dielectric
around CNT
Gate
Source
LSD
Doped CNT
Drain
S
Lgate
Substrate
57
Ballistic current flow model
[ L. Wei, et al., IEDM 2009 ]
Source transmitted
carriers
μS
EC
μD
ϕS
Source
EV
Reflected carriers
ϕch
Drain transmitted
carriers
ϕD
EC
Drain
EV
58
Final comparison
All devices hit the ‘power wall’. But, TFETs can go to lower energy.
Energy per logic transition (fJ)
10
PDSOI
1
CNFET
FinFET
0.1
III-V TFET
ideal TFET
(m*=.02, d=4nm)
0.01
1.E+06
1.E+07
1.E+08
1.E+09
1.E+10
1.E+11
Frequency (Hz)
[General trends, optimistic assumptions.]
59
4. Summary
•
•
CMOS scaling is limited by electrostatic, quantum mechanical,
discreteness, thermodynamic and practical effects.
Optimization can and should be used to find the best design points
in the midst of these various constraints.
– Processor level optimization can be very different from individual device
performance optimization.
– Example: low power needs somewhat less scaled devices.
•
Technology performance projections based on optimization have
been presented for PDSOI and steep slope devices.
– All devices have frequency versus energy trade-offs.
– Steep slope devices need high drive current to be competitive at
high performance, but may offer lower energy solutions at lower
speeds.
60
The End of Scaling is Optimization
log(System Performance)
Stop when you get to the top.
Miniaturization
61
Acknowledgements
•
•
•
•
•
•
•
Wilfried Haensch
Leland Chang
Paul Solomon
Steve Koester
Lan Wei
Philip Wong
Ghavam Shahidi
•
•
•
•
•
•
•
Mike Scheuermann
Phillip Restle
Omer Dokumaci
Mary Wisniewski
Steve Kosonocky
Yuan Taur
Bob Dennard
62