Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
System Level Advantages of Steep Slope Devices David J. Frank 8/31/2012 SiNANO Summer School Bertinoro, Italy 1 Overview • Steep slope devices are being pursued to reduce the energy of computation. – Lower voltage operation – BUT, possibly low drive current • To understand impact, we need to consider what the system processor looks like from a device perspective. • We need to optimize the devices and circuits for both efficiency and performance. 2 Outline 1. Why TFETs? Review conventional limitations. 2. Characterizing a processor with simple models 3. Optimizing devices for processor-level performance a) General trends for MOSFETs b) Projections for TFETs 4. Conclusion 3 1. Limitations to Scaling 1. Electrostatic constraints 2. Quantum mechanical leakage currents 3. Discreteness of matter and energy 4. Thermodynamic limitations 5. Practical and environmental constraints on power Basic idea of Scaling: Adjust dimensions, voltages, & doping to achieve smaller FET with same electrostatic behavior. 4 Electrostatic constraints on FET design 1. A good design should have A. B. C. D. E. F. High output resistance High gain Low sensitivity to variations High transconductance High drain current High speed 2. Must choose a compromise: Short, but not so short that 2D effects kill A, B, C. 3. STEEP FETs may have different stuctures, but will still have aspect ratio requirements. } } Long channel behavior Short channel behavior Gate S D 5 Quantum Mechanical Tunneling Leakage Currents Leakage currents increase exponentially as the barriers become thinner. TFETs use the tunneling to achieve steep slope. FET 'ON' FET 'OFF' Vdd Gnd Gnd Gnd Gnd Vdd Gate insulator tunneling Subthreshold leakage Direct source-to-drain tunneling Drain-to-body tunneling ee- Source e- Channel Drain 6 Atomistic effects The number of dopant atoms in the depletion layer of a MOSFET has been scaling roughly as Leff1.5. Statistical variation in the number of dopants, N, varies as N1/2, causing increasing VT uncertainty for small N. 249,403,263 Si atoms 68,743 donors 13,042 acceptors [D. J. Frank, et al., 1999 Symp. VLSI Tech.] STEEP FETs may also be sensitive to atomistic effects 7 Thermodynamic limitations For conventional FETs, the Boltzmann distribution determines the subthreshold slope and leakage current, VT, and diode leakage currents, too. e(VG -VT )/ηkT IsubVT = I0 e VT can only be scaled by reducing the temperature, which is not acceptable for many applications. STEEP slope FETs seek ways around this limitation: •Band-to-band tunneling (TFETs) •NEMS electrostatic switches •Ferroelectric FETs e- Source Channel Drain Irreversible computation => All switching energy is converted to heat. 8 Practical and Environmental Issues • Power consumption and heat removal are limited by practical considerations. • Low power applications must be battery powered – Many must be lightweight => power < ~few watts. – Disposable batteries can cost >> $500/watt over life of device. – Rechargeables can cost > $50/watt over life of device. • Home electronics is limited to <~1000W by heating of the room and cost of electricity. • High performance is limited by difficulty of heat removal from chip (~100 W/chip). (Cost of electricity is ~$5/watt over life.) 9 2. Modeling a processor Fixed parameters Requires a large set of simplified models. Variables: initial guess new values: improved guess Area Model Thermal Model Wiring statistics Wire Capacitance Delay tolerance adjustments Adjust for Latency of Long Paths Embed in optimization loop to find best design. Device Structure IV Model Leakage Model Constrained optimizer Leakage Power tolerance adjustments Total Power 10 Models and Approximations Simplifying Assumptions zProcessor chip is assumed to have a fixed number of cores, each with a specified number of logic gates. zOnly the logic within the cores is considered in the model. zThe clock and memory aspects of the chip are assumed to scale in the same way as the logic (delay, power, and area). Clock zCore-to-core and core-to-memory communication are not dealt with. Fudge Repeaters Logic Treat in detail Memory Fudge Treat these by simple scaling from the logic part. 11 How much area do the processor cores take? 100% to 25%, generally decreasing with generation: 70% Prescott. 125M FETs 100% Alpha 21264 ('96) 15M FETs, L1 cache only 40% Dothan, 140M FETs 40% Power4, 174M FETs ~25% 2 cores, 1.72B FETs 12 Area usage within a processor core Approximate area fractions for a high-performance microprocessor core in leading-edge technology 9.3% 23.3% 7.0% data from: M. Scheuermann and M. Wisniewski 9.3% 23.3% 7.0% 9.3% 60% 60.5% 23.3% 1/3 7.0% 9.3% 20.2% 23.3% 2/3 13.3% 40.3% Buffers & extra latches 7.0% 20.2% 33% 31% 36% 20.2% 13.3% Caches (L1) Buffers & extra latches Macros Caches Caps, Clock dist., Unused Register files Custom & RLMs Caps, Clock dist., Unused 12.5% 14.5% 1.2% Buffers & extra latches Caches Register files Processors built with nanotechnology are likely to have similar area usage statistics. Nanotechnology may require additional area allocations for defective circuitry. Estimates of power and computational densities should take into account realistic area efficiencies. Latches and LCBs Logic Unused/caps Caps, Clock dist., Unused 90% 11.2% 14.5% Buffers & extra latches Caches Register files Latches and LCBs Logic in use Unused Logic Unused/caps Caps, Clock dist., Unused 13 Chip Power and Area cores core logic Power: core “other” caches, controllers, I/O 100% Area: logic 0% Clk, reg, L1 Caps/ unused caches (SRAM/DRAM) Controllers, I/O, etc. cores Simple model scales the entire chip from this core logic. Can improve model by creating a separate model for the caches. Note that a processor chip has much more memory than logic: ►Steep slope devices must also provide for memory functions. 14 Circuit Delay Estimation Approximate all of the logic gates as NAND2 gates with average width, average FO=1.65, and average length wire capacitance. The clock period is given by n × the average gate delay, where n depends on the architecture. Delay estimation: τ1 = VDD (C parasitic + Cwire + C gateload ) * 2 I eff Current is adjusted to account for noise and variations. τ 2 = Rwire (C wire + C gateload ) τ3 = Lwire (c / 2) + (τ ) 4 / 3 3/ 4 τ3 Propagation delay τ1 + τ= 0.5 + (1 − VT / VDD ) (1 + α) 4/3 2 Final delay corrections based on Eble's thesis. Correction for VT/VDD. 15 Power Calculation PTOT = PDYN + PsubVT + POX + PB 2 B PDYN = α lD N CKT 1 2 C (VH − VL )VDD / τ PsubVT = N CKT VDD w J off (VL − VT , VDD , tox , LG ,η ) Pox = 1.2 N CKT VDD w LG J ox (VT , VDD , tox ,η ) PB 2 B = 0.75 N CKT VDD w hJ J B 2 B (FMax , VDD ) LTR = lαD NCKT 1 τ (Logic Transition Rate) τ = mean delay for a single loaded logic gate α lD is activity factor divided by logic depth. Usually ~0.0052 in recent optimizations. 16 Communication and Wiring Models ( 4FO ⎛ 2r −3 − 2 NCKT ⎜l 3 + FO ⎝ lR LnoRptr ∫ = ∫ 1 lR 1 N Rptr = ∫ l Max lR linet (l)dl inet (l)dl ) 2 r −3 ⎞⎟ ⎠ linet (l)dl l R Units are gate pitches. r = Rent exponent, 0.6, here. # Wiring Levels Required inet (l) = log(number of wires) Assume wire lengths distributed according to Rent's rule. lR 2 NCKT log(length) From optimizations: 12 10 8 6 4 1E+5 1E+6 1E+7 1E+8 Number of Logic Gate 17 Repeater Model 100 0.9 0.8 0.7 80 0.6 70 60 0.5 0.4 Pecon=10 W/cm2 0.3 50 0.01 0.2 0.1 Latency Penalty Factor 1 Repeater Spacing (cm) 90 1 1 Repeater spacing Repeater width Repeater Width (um) Repeater Spacing (um) Long wires should receive repeaters with a spacing that is optimized. Long wire delay can be absorbed into pipeline depth, but the latency causes inefficiency, so one can use a latency penalty factor: γ. 0.1 0.01 9S 10S 0.001 0.01 11S 0.1 12S 1 13S 10 100 CPU Core Power (W) 18 Wire Model Wires are usually taller than they are wide, to reduce resistance. For 2:1 height to width, wiring with equal lines and spaces has Cwire = 0.062 kBEOL fF/um. Wire resistance increases rapidly at small dimensions due to surface scattering. This fit is based on data from T.S. Kuan (MRS Symp Proc, Vol. 612, 2000): ( ⎛ 70nm ln 1 + e (T − 40) / 10 ρ (W , T ) = ρ300 K ⋅ ⎜⎜ + 26 ⎝ W )⎞⎟ ⎟ ⎠ This model: 4.0 Ta/Plated-Cu/Ta Annealed at 400°C 68 nm 115 nm 193 nm 503 nm 1.11 μm 50 nm 100 nm 200 nm 500 nm 1000 nm Resistivity ρ (μΩ-cm) 3.0 2.0 1.0 0.0 0 50 100 150 200 250 300 T (K) T.S. Kuan (SRC Report) measurements of thin copper films. 19 Variability effects • Most-probable worst-case-vector approach can handle both local and global variability. • Variation sources: – Signal Coupling noise – Supply noise – Processing variations • Consequences that need to be modeled: – – – – Increased static power Critical path delay distribution Single stage functionality Incomplete switching to supply rails. 20 Estimating worst-case conditions Example targeting 50% parametric yield. Chip Power (W) Calculated worst case. Optimize THIS. 230 W Nominal design point Clock Frequency (GHz) 21 Estimating worst-case conditions Example targeting 50% parametric yield. Chip Power (W) Shift to VDD,LOW (negative guardband) 230 W Nominal design point VDD-10% Clock Frequency (GHz) 22 Estimating worst-case conditions Example targeting 50% parametric yield. Chip Power (W) Add local and global variability 230 W Nominal design point VDD-10% Clock Frequency (GHz) 23 Estimating worst-case conditions Example targeting 50% parametric yield. 25% of yield is lost for delay. Chip Power (W) Shift to 25 percentile frequency bound 230 W Nominal design point VDD-10% Clock Frequency (GHz) 24 Estimating worst-case conditions Chip Power (W) Example targeting 50% parametric yield. 25% of yield is lost for delay. At fmin, shift to nominal power at VDDhigh (positive guardband) 230 W VDD+5% Nominal design point VDD-10% Clock Frequency (GHz) 25 Estimating worst-case conditions Chip Power (W) Example targeting 50% parametric yield. 25% of yield is lost for delay. Add local and global variability at this fixed frequency 230 W VDD+5% Nominal design point VDD-10% Clock Frequency (GHz) 26 Estimating worst-case conditions Example targeting 50% parametric yield. 25% of yield is lost for delay. 25% of yield is lost for power. Chip Power (W) Shift to 75 percentile power 230 W VDD-10% Nominal design point Clock Frequency (GHz) 27 Impact of variability Energy/transition (fJ) 10 1 full tolerances noise, no var no noise, no var var, no noise Variability and noise tolerances increase energy × delay product by ~3X. 0.1 1.E+08 1.E+09 1.E+10 Frequency Optimized energy-frequency curves for 22nm PDSOI 28 Summary: Models, Assumptions and Approximations (I) Power modeling Dynamic switching energy plus static power mechanisms including sub-threshold current, gate oxide tunneling, and body-to-drain band-to-band tunneling. Device modeling Flexible physics-based compact model that takes 2D effects into account and allows LG and VT to be optimized. Gate length should be fully optimized, not set by the technology node. Circuit modeling Delay is for average loaded NAND2 gates, based on model from J.C. Eble's thesis [Ga.Tech. '98]. Capacitance includes gate, parasitic, and wire parts (Rent's rule). Wire resistance includes temperature dependence and surface scattering in small wires. 29 Summary: Models, Assumptions and Approximations (II) Chip-level modeling Allocate fixed fraction of chip power and area to logic, and assume fixed number of logic gates. Optimize the logic part and assume the rest scales similarly. Assume multiple processor cores are interconnected in a way that does not greatly add to the wiring burden. Long wires are fatter, and receive repeaters with a spacing that is optimized. Long wire delay is accounted for using a latency penalty factor. On-chip tolerance/variability and noise is accounted for. 30 3. Optimization Results and Projections • General results for MOSFETs • Projections for steep slope devices – Other novel devices, too 31 What defines a technology generation? • Process improvements – Lithographic tolerance improvements enable smaller dimensions • Specify dimensional tolerances for each generation, instead of the dimensions themselves. • Except that it may be necessary to specify the wire pitch – Better doping and annealing techniques enable more abrupt doping profiles • Specify the characteristic length scales of the doping • New materials – Gate dielectric materials and properties must be specified for each generation • Allow optimization to choose thickness – Wiring dielectric constant must be specified for each generation • New structures – Models for different FET structures may be necessary – New parasitic resistance and capacitance models may be necessary – Different heat sink concepts may arise for different generations • Usually just assume a fixed junction temperature 32 Optimizing Technology within Constraints ¾Practicality imposes power constraints. ¾Electrostatics imposes geometric constraints ¾Thermodynamics imposes voltage constraints. ¾Quantum mechanics imposes miniaturization constraints due to tunneling. Fixed architectural complexity + Fixed power constraints + Device physics = Existence of an optimal technology with maximal performance. leakage increases due to tunneling effects dynamic power Large Miniaturization Small log(Performance) Power leakage power Declining available dynamic power overwhelms speed improvements of scaling Large Miniaturization Small 33 General optimization results – performance vs power 1.E+10 1.E+09 Frequency (Hz) As always, the easiest way to increase performance is to increase the power. 22nm 16nm 11nm 1.E+08 1.E+07 0.01 0.1 1 10 100 1000 Total Chip Power(W) Conditions: PDSOI, 4 core processor chip, constraining total chip power Optimizing: VDD, tox, dopings (for VTs), LG, p:n width ratio, mean widths, repeater size and spacing. 34 Gate Length and Chip Area vs Power 50 Lower power requires longer gate lengths, to reduce variability. 0.8 45 0.7 40 Chip Area (cm2) Gate Length (nm) 0.6 22nm 35 30 25 11nm 20 22nm 0.5 22nm 22nm 0.4 16nm 16nm 11nm 11nm 0.3 11nm 15 0.2 10 5 0 0.01 0.1 Toxeqv: 0.6 – 0.82 nm 0.1 1 10 Total Chip Power(W) 100 1000 0 0.01 Device density increases for each generation. Shrinking area increases power density, too. 0.1 1 10 100 1000 Total Chip Power(W) Conditions: PDSOI, 4 core processor chip, constraining total chip power Optimizing: VDD, tox, dopings (for VTs), LG, p:n width ratio, mean widths, repeater size and spacing. 35 Voltage and Energy vs Power Optimal supply voltage can become quite low for low power constraints, leading to very low energy use per logic transition. 16 1.4 14 Energy per logic transition (fJ) 1.2 Supply Voltage (V) 1 0.8 11nm 22nm 0.6 0.4 0.2 0 0.01 12 10 22nm 8 16nm 22nm 11nm 11nm 16nm 6 22nm 4 11nm 2 0.1 1 10 Total Chip Power(W) 100 1000 0 0.01 0.1 1 10 100 1000 Total Chip Power(W) Conditions: PDSOI, 4 core processor chip, constraining total chip power Optimizing: VDD, tox, dopings (for VTs), LG, p:n width ratio, mean widths, repeater size and spacing. 36 Energy vs performance trade-off Energy per logic transition (fJ) 10 10x 22nm 16nm 1 11nm 22nm 4x 11nm 0.1 1.E+07 1.E+08 1.E+09 1.E+10 Clock Frequency (Hz) Conditions: PDSOI, 4 core processor chip, constraining total chip power Optimizing: VDD, tox, dopings (for VTs), LG, p:n width ratio, mean widths, repeater size and spacing. 37 Optimal On/Off Ratio PDSOI nFETs, currents measured at nominal process and bias conditions bi as co nd itio ns 0.1 Hi gh 1000:1 22 nm 16 nm 0.001 11 nm co nd itio ns Off-current (A/cm) 0.01 Lo w bi as 0.0001 10000:1 0.00001 0.1 1 10 100 On-current (A/cm) Conditions: PDSOI, 4 core processor chip, constraining total chip power Optimizing: VDD, tox, dopings (for VTs), LG, p:n width ratio, mean widths, repeater size and spacing. 38 Power allocation – Logic versus communication Each point is normalized by its own total power: dynamic+static, logic+repeater When the repeater static power is included with the power for switching the wires (logic & repeaters), the fraction is roughly constant. 39 Communication vs Logic – alternate devices Broken gap CNT Staggered gap TFET Majority of power is attributed to communication even for exotic devices Opportunity for power reduction – low voltage signal transfer! 40 Novel devices for 11nm and beyond • III-V FinFETs – Higher mobility improves drive current • Tunnel FETs – Improved subthreshold slope enables low VDD and low energy operation. – To properly model this device, have to be able to calculate the tunneling barrier shapes and band-edge alignments. – A compact model for TFETs has been developed [P Solomon, 2011 DRC], but for more general insight we can alter the Boltzmann constant in the conventional FET model, to see the impact of steeper subthreshold slope. • Carbon Nanotube Transistors – Ballistic current flow in the channel enables very high switching speeds for these devices. – A compact model for CNTs was presented at IEDM 2009 [L. Wei]. 41 FET Model General purpose empirical temperature-dependent FET model using a Fermi-Dirac integral to provide an alpha power model above VT and exponential dependence below VT Can vary k and μ to roughly evaluate alternative devices. Wε ηkT ⎛ ηkT / e ⎜ I Dsat (VGS ) = eff I tox e ⎜⎝ FI EC LCH γ s ⎞ ⎛ μ ( E⊥ ) ⎞ ⎛ V − VT ⎟⎟ μ0 ⎜⎜ ⎟⎟ EC Fα ⎜⎜ GS ⎝ ηkT / e ⎠ ⎝ μ0 ⎠ ⎞ ⎟⎟ ⎠ 10W FET Lg=28nm 1mW FET Lg=45nm 42 III-V FinFETs III-V FinFETs are modeled by increasing the mobility in the conventional model. Increased mobility enables an improved energy/performance tradeoff by reducing the voltage needed for high performance designs. FinFET III-V 2x III-V 4x 10 Fin 2x 4x Drive current multiplier: 1.2 1x (=Si) 1 2x (~GaAs) 4x (~InGaAs) 0.1 1.E+13 1.E+14 1.E+15 Supply Voltage (V) Energy/transition (fJ) 1.4 1 0.8 0.6 0.4 0.2 0 0 2 4 6 8 10 12 14 Frequency (GHz) 1.E+16 Performance (transitions/sec) 43 16 The Band-to-Band Tunnel-FET Operating principle of a tunnel field-effect transistor: Bands are crossed in the ‘on’ state and uncrossed by the gate voltage in the ‘off’ state. In ‘off’ state energy tails are cut-off by valence band in Source and conduction band in Drain. Comment: in non-heterojunction case, it is difficult to achieve high enough fields for strong tunneling, so the on-current tends to be reduced. 44 TFET Heterostuctures Planar SiGe H-TFET gate III-V H-TFET ON-state poly source n+ Si p-Si Off-state Buried Oxide Optimum electrostatics with gate all around New material combinations Si Integration onto silicon possible Vg > 0 SiGe High on current III-V materials offer lower effective masses and more heterojunctions, to further improve tunneling. Nanowire Geometry Offers: drain p++ SiGe SiGe Heterostuctures offer low effective bandgap, which improves tunneling. Ambipolar device supressed Scaling to quantum capacitance limit Improved on-current Si Vg = 0 Ambipolar device supressed SiGe Si x [Koester, Riel, Koswatta] 45 Optimizing steep slope FETS Performance vs slope and drive current, at constant power density (high performance regime) Mobility multiplier: Performance (arb units) 3000 1x 0.6x 0.4x 0.25x 0.15x 0.04x 0.01x 0.003x 0.001x 0.0003x 2500 Highest drive current 2000 1500 1000 500 Lowest drive current 0 0 20 40 60 80 100 Sub-VT swing (mV/dec) Using the generic FET model. Node: 11nm, Power constraint: 25 W/cm2 46 Optimizing steep slope FETs Energy vs performance, Low power regime Energy / transition (fJ) 10 Current multiplier: 1 0.03x 60 mV/dec 1.0x 0.5x 0.25x 0.12x 0.06x 0.03x FinFET 10 mV/dec 60 mV/dec 0.1 1.E+08 1.0x 1.E+09 TFET 10 mV/dec 1.E+10 Frequency (Hz) Sub-VT slope is swept in each TFET curve. Generic FET model, 11nm node. Current drive must be high to maintain advantage over conventional devices. 14nm FinFET curve is shown for comparison (power is swept). 47 Impact of reduced current drive in TFETs 100 Energy per logic transition (fJ) Slope = 31 mV/decade 10 I=1/10 I0 I=1/3 I0 I=I0 1x 1 1/3x 1/10x 0.1 0.01 1.E+07 1.E+08 1.E+09 1.E+10 1.E+11 Frequency (Hz) 48 Optimized voltage for steep slope devices • • Adjusting only the sub-VT slope. Current drive remains at 1x. Variability and tolerances limit voltage scaling. Even extrapolated to zero slope, the supply voltage may remain fairly high. 7GHz 2.5GHz 7GHz +10mV 7GHz 2.5Ghz +10mV 7GHz +10mV 2.5Ghz +10mV 1.8 0.7 1.6 Energy / transition (fJ) 0.6 Supply Voltage (V) 2.5GHz 0.5 0.4 0.3 0.2 0.1 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 0 20 40 60 80 100 0 SubVT Swing (mV/dec) 20 40 60 80 100 SubVT Swing (mV/dec) These optimizations are for fixed performance, minimizing power, 11nm node. Optimized widths, VT’s, VDD, repeaters. The +10mV cases have an extra 10mV (one sigma) local variability in the VT. 49 Comparing devices – Energy vs Performance Energy per logic transition (fJ) 10 PDSOI 22nm 1 PDSOI 16nm PDSOI PDSOI 11nm FinFET Si FinFET GaAs FinFET FinFET InGaAs TET 1x TFET 1/3x 0.1 TFET 1/10x TFET 0.01 1.E+07 1.E+08 1.E+09 1.E+10 1.E+11 Frequency (Hz) 50 TFET compact model [ P. Solomon, et al., 2011 DRC ] Assumes a cylindrical nanowire heterojunction TFET with wrap-around gate. Positive source Fermi level Drain characteristics, with degenerate source -EFS VDS < 0 VDS > 0 51 Screening Length Model: e − x /λ s Yu et al. IEEE TED 2008 Model assumes εi / εs =1. For εi / ε s = 1: λ= π ( R + ti ) J 0 (0) −1 and λ s =λ / π = ( R + ti ) 2.405 52 WKB Integral Function WKB Integral: ⎡ 2 xc ⎤ T ( E ) = exp ⎢ − ∫ dx 2me (Vc1 − E ) ⎥ ⎣ h0 ⎦ E x ln T ( E ) = − 2 λ h s Convert integral over distance to integral over energy. Vc ( x ) = E 2 meV c1 1− E / Vc1 (dx / λ s ) ∫ 0 = − h2 λ s 2 meV c1 G( u , Lr ) where u=(V −Vc1 +ϕ+Ve ) /Vc1, Lr =L/λs , and scaled channel length Lr as a parameter, where G(u , L) = ∫ 1+u − ln[e− Lr + (1− e− Lr ) w2 − u ]dw =∫ 1+u − ln[e− Lr + (1− e− Lr ) w2 − u ]dw 0 u Scaled WKB Integral function G(u,Lr), with ( ) (u < 0) (u ≤ > 0) through-barrier tunneling occurs for negative, and under barrier tunneling for positive u. 53 Material parameters for InGaAs / GaSb GaSb InAs 1.6 0.22 CB Edge Shift (eV) 1.2 EC 0.18 m/m0 0.16 0.14 1.0 0.12 0.8 0.10 0.6 0.08 0.06 0.4 Effective mass (m0) 0.20 1.4 radius 0.04 0.2 0.02 0.0 0.00 1 10 Radius (nm) Steven Laux program 54 Optimizing InAs/GaSb TFETs TFET FinFET Energy per transition (fJ) 10 •16 cores of 1.5x106 circuits each are assumed. •Total chip power is constrained. •Vdd, L, Vt , ve, widths, and φ were varied to maximize the clock frequency. •Nanowire diameter determines the InAs/GaSb bandgaps and band alignments. 1 0.1 0.01 1.E+08 1.E+09 1.E+10 Frequency (Hz) Optimized design shows 2-3x lower energy for TFETs. 55 Optimized InAs/GaSb Device Parameters TFET FinFET •Optimized TFET slope and supply are lower than FinFET, but degrade for higher performance. •Fin pitch can be tighter than TFET nanowire pitch, indicating lower integration density for TFETs. Supply Voltage (V) 1 FinFET TFET TFET diam 0.1 1.E+08 1.E+09 Dimension (nm) Subthreshold Slope (mV/dec) Frequency (Hz) TFET FinFET FinFET Fin height TFET diameter 10 Fin width TFET 1 1.E+08 10 1.E+08 FinFET h 100 1.E+10 100 FinFET tsi 1.E+09 1.E+10 Frequency (Hz) 1.E+09 Frequency (Hz) 1.E+10 56 Carbon Nanotube FET (CNFET) Geometry assumption: GAA (gate all around) in dense planar array of carbon nanotubes. Dielectric around CNT Gate Source LSD Doped CNT Drain S Lgate Substrate 57 Ballistic current flow model [ L. Wei, et al., IEDM 2009 ] Source transmitted carriers μS EC μD ϕS Source EV Reflected carriers ϕch Drain transmitted carriers ϕD EC Drain EV 58 Final comparison All devices hit the ‘power wall’. But, TFETs can go to lower energy. Energy per logic transition (fJ) 10 PDSOI 1 CNFET FinFET 0.1 III-V TFET ideal TFET (m*=.02, d=4nm) 0.01 1.E+06 1.E+07 1.E+08 1.E+09 1.E+10 1.E+11 Frequency (Hz) [General trends, optimistic assumptions.] 59 4. Summary • • CMOS scaling is limited by electrostatic, quantum mechanical, discreteness, thermodynamic and practical effects. Optimization can and should be used to find the best design points in the midst of these various constraints. – Processor level optimization can be very different from individual device performance optimization. – Example: low power needs somewhat less scaled devices. • Technology performance projections based on optimization have been presented for PDSOI and steep slope devices. – All devices have frequency versus energy trade-offs. – Steep slope devices need high drive current to be competitive at high performance, but may offer lower energy solutions at lower speeds. 60 The End of Scaling is Optimization log(System Performance) Stop when you get to the top. Miniaturization 61 Acknowledgements • • • • • • • Wilfried Haensch Leland Chang Paul Solomon Steve Koester Lan Wei Philip Wong Ghavam Shahidi • • • • • • • Mike Scheuermann Phillip Restle Omer Dokumaci Mary Wisniewski Steve Kosonocky Yuan Taur Bob Dennard 62