* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Interconnects
Variable-frequency drive wikipedia , lookup
Ground (electricity) wikipedia , lookup
Immunity-aware programming wikipedia , lookup
Wireless power transfer wikipedia , lookup
Power over Ethernet wikipedia , lookup
Electric power system wikipedia , lookup
Stray voltage wikipedia , lookup
Three-phase electric power wikipedia , lookup
Audio power wikipedia , lookup
Pulse-width modulation wikipedia , lookup
Time-to-digital converter wikipedia , lookup
Power inverter wikipedia , lookup
Electrification wikipedia , lookup
Telecommunications engineering wikipedia , lookup
Transmission line loudspeaker wikipedia , lookup
Resistive opto-isolator wikipedia , lookup
Opto-isolator wikipedia , lookup
Electrical substation wikipedia , lookup
Distribution management system wikipedia , lookup
Power electronics wikipedia , lookup
Voltage optimisation wikipedia , lookup
Power engineering wikipedia , lookup
Amtrak's 25 Hz traction power system wikipedia , lookup
Buck converter wikipedia , lookup
History of electric power transmission wikipedia , lookup
Switched-mode power supply wikipedia , lookup
Alternating current wikipedia , lookup
Interconnects Mohammad Sharifkhani With special thanks to Jan Rabaei Reading • Textbook II, Chapter 9 – You may have to take a look at previous chapters too Impact of Interconnect Parasitics • Reduce Robustness • Affect Performance • Increase delay • Increase power dissipation Classes of Parasitics • Capacitive • Resistive • Inductive INTERCONNECT Capacitive Cross Talk X CXY VX Y CY •Worst case scenarios: •Floating lines •e.g. Tri-state wires •Low-swing precharged nodes •Bitlines in DRAM •Low-swing busses •Dynamic logic •X-Talk influence •High swing in adjacency •To solve: •Level restoring device •Keeper Capacitive Cross Talk Dynamic Node V DD CLK In 1 In 2 In 3 CLK Cxy = 0.5fF Cy= 6fF CXY Y CY X PDN 2.5 V Assume PDN is evaluated to be Off. 0V Combine that with charge redist. and CFT. error 3 x 1 mm overlap: 0.19 V disturbance Capacitive Cross Talk Driven Node 0.5 Tr = 5p 0.45 0.4 X VX RY CXY tr↑ 0.35 0.3 Y CY V tXY = RY(CXY+CY) 0.25 0.2 0.15 V (Volt) 0.1 Tr = 500p 0.05 Rx =10K Cy = 20f Cxy = 5f DV = 2.5V 0 0 0.2 0.4 0.6 t (nsec) Key: Keep time-constant smaller than rise time 0.8 1 Dealing with Capacitive Cross Talk • Avoid floating nodes • Protect sensitive nodes – No long parallel sensitive node along with high swing • Make rise and fall times as large as possible – Short-circtui power • Differential signaling • Do not run wires together for a long distance – At least not on the same layer – At least far apart (busses) • Use shielding wires • Use shielding layers Shielding Shielding wire GND V DD GND Substrate (GND ) What is bad about it? C goes up! Shielding layer Cross Talk and Performance Cc Miller Effect - When neighboring lines switch in opposite direction of victim line, delay increases DELAY DEPENDENT UPON ACTIVITY IN NEIGHBORING WIRES - Both terminals of capacitor are switched in opposite directions (0 Vdd, Vdd 0) - Effective voltage is doubled and additional charge is needed (from Q=CV) Miller effect -The wire delay with noise can be 80% larger than without it! -More complicated time of transition of every single one -Verification cost , design overkill Impact of Cross Talk on Delay Tp = g CW (0.38 RW + 0.69 RD) CW: cw . L RW: rw . L RD: Driver resistance r is ratio between capacitance to GND and to neighbor g: 1 if all go the same way g=1+ x r r= cinter_wire/cw Solutions to X-Talk • Evaluate and improve: – Post sim. extraction : most often used, slow, many iteration • Constructive layout generation – Cad-tools: can we really rely? • Predictive structures – FPGA intercon. – Next slide Structured Predictable Interconnect V S G S V S S G S V S V Example: Dense Wire Fabric ([Sunil Kathri]) Trade-off: • Cross-coupling capacitance 40x lower, 2% delay variation • Increase in area and overall capacitance Also: FPGAs, VPGAs Interconnect Projections Low-k dielectrics • Both delay and power are reduced by dropping interconnect capacitance • Types of low-k materials include: inorganic (SiO2), organic (Polyimides) and aerogels (ultra low-k) • The numbers below are on the conservative side of the NRTS roadmap Generation Dielectric Constant 0.25 mm 3.3 0.18 mm 2.7 0.13 mm 2.3 0.1 mm 2.0 0.07 mm 1.8 0.05 mm 1.5 e Encoding Data Avoids Worst-Case Conditions In Encoder Bus Decoder Out Minimizing (opposite) transitions reduces delay by a factor of 2 (despite its look) Driving Large Capacitances V DD V in V out CL • Transistor Sizing • Cascaded Buffers Using Cascaded Buffers In Out 1 2 N •Equal delay in every stage •Fan out of 4 0.25 mm process Cin = 2.5 fF tp0 = 30 ps F = CL/Cin = 8000 fopt : 3.6 N = 8000 N = 7 tp = 0.76 ns (See Chapter 5 of Rabaei’s book) CL = 20 pF Off-chip load ~50pF •Fixed with tech Scaling effectively increases As tech scales Example • Consider the case where an on-chip minimum-size inverter has to drive an off-chip capacitor CL of 20 pF. • Ci equals approximately 2.5 fF for a standard gate in a 0.25 mm CMOS process. This corresponds to a tp0 of approximately 30 psec. The overall effective fanout F (the ratio between CL and Ci) equals 8000. γ is 1 if self loading is Equal to fanout f = 3.6 N=7 Too large T, we don’t need max speed at IO Output Driver Design Trade off Performance for Area and Energy Given tpmax find N and f • Area f 1 F 1 A 1 f f ... f A A A f 1 f 1 2 N N 1 driver • Energy min 2 Edriver 1 f f 2 ... f N 1 CiVDD Let’s find tp for a given F and N (i.e, for a given f) min min F 1 C 2 2 CiVDD L VDD f 1 f 1 Delay as a Function of F and N 10,000 F = 10,000 tp/tp0 1000 100 F = 1000 10 1 3 5 7 F = 100 9 Number of buffer stages N f N F f NF 11 Output Driver Design 0.25 mm process, CL = 20 pF Transistor Sizes for optimally-sized cascaded buffer tp = 0.76 ns Transistor Sizes of redesigned cascaded buffer tp = 1.8 ns Tremendous reduction in area (and power) is achievable optimization How to Design Large Transistors D(rain) Multiple Contacts S(ource) G(ate) small transistors in parallel Reduces diffusion capacitance Reduces gate resistance Bonding Pad Design Bonding Pad GND 100 mm Out VDD In GND Out Guard Rings Example I Example II INTERCONNECT Impact of Resistance • We have already learned how to drive RC interconnect • Impact of resistance is commonly seen in power supply distribution: – IR drop – Voltage variations • Power supply is distributed to minimize the IR drop and the change in current due to switching of gates RI Introduced Noise IR Drop V DD f pre I R9 V DD 2 Δ V ‘ X M1 I ΔV ΔV Instead of 0 R Can turn on M1 if X is Tri-state X discharges to GND High current BAD Supply noise, performance drop Power Dissipation Trends 160 140 120 100 80 60 40 20 0 3.5 2.5 2 1.5 1 0 EV4 EV5 EV6 EV7 EV8 Supply Current 3.5 120 3 100 2.5 80 2 60 1.5 40 1 20 0.5 0 Better cooling technology needed Supply current is increasing faster! OnOn-chip signal integrity will be a major issue Power and current distribution are critical Opportunities to slow power growth Voltage (V) Current (A) 0.5 140 Power consumption is increasing 3 Voltage (V) Power (W) Power Dissipation Accelerate Vdd scaling Low κ dielectrics & thinner (Cu) interconnect SOI circuit innovations Clock system design micromicro-architecture L o w κ d i e l e c t r i c s & t h i n n e r ( C u ) 0 EV4 EV5 EV6 EV7 EV8 ASP DAC 2000 19 Power Distribution • Low-level distribution is in Metal 1 • Power has to be ‘strapped’ in higher layers of metal. • The spacing is set by IR drop, electromigration, inductive effects • Always use multiple contacts on straps Power and Ground Distribution GND VDD Logic Logic VDD GND (a) Finger-shaped network VDD GND (b) Network with multiple supply pins 3 Metal Layer Approach (EV4) 3rd “coarse and thick” metal layer added to the technology for EV4 design Power supplied from two sides of the die via 3rd metal layer 2nd metal layer used to form power grid 90% of 3rd metal layer used for power/clock routing Metal 3 Metal 2 Metal 1 Courtesy Compaq 4 Metal Layers Approach (EV5) 4th “coarse and thick” metal layer added to the technology for EV5 design Power supplied from four sides of the die Grid strapping done all in coarse metal 90% of 3rd and 4th metals used for power/clock routing Metal 4 Metal 3 Metal 2 Metal 1 Courtesy Compaq 6 Metal Layer Approach – EV6 2 reference plane metal layers added to the technology for EV6 design Solid planes dedicated to Vdd/Vss Significantly lowers resistance of grid Lowers on-chip inductance RP2/Vdd Metal 4 Metal 3 RP1/Vss Metal 2 Metal 1 Low Res. , Shielding + Low inductance Courtesy Compaq IR Drop • IR Drop is a dynamic problem • Usually at peak right after the clock edge • Or when driving large loads (busses, pads) – Separate VDD VSS for pads • Worst case analysis overkill • Changing one place affects other places – CAD tools is needed whole chip as one entity (cadence power rail-mill) Resistance and the Power Distribution Problem After Before • Requires fast and accurate peak current prediction • Heavily influenced by packaging technology Adding an extra rail solves the problem Source: Cadence Resistivity and Performance Tr The distributed rc-line R1 RN-1 R2 C1 C2 RN CN-1 CN Vin 2.5 Delay ~ L2 x = L/4 voltage (V) Diffused signal propagation x= L/10 2 1.5 x = L/2 1 x= L 0.5 0 0 0.5 1 1.5 2 2.5 3 time (nsec) 3.5 4 4.5 5 The Global Wire Problem Td 0.377 RwCw 0.693Rd Cout Rd Cw RwCout Challenges • Gate delay drops, interconnect goes up • No further improvements to be expected after the introduction of Copper (superconducting, optical?) • Design solutions – Use of fat wires – Insert repeaters — but might become prohibitive (power, area) – Efficient chip floorplanning • Towards “communication-based” design – How to deal with latency? – Is synchronicity an absolute necessity? Interconnect Projections: Copper • Copper is planned in full sub-0.25 mm process flows and large-scale designs (IBM, Motorola, IEDM97) • With cladding and other effects, Cu ~ 2.2 mW-cm vs. 3.5 for Al(Cu) 40% reduction in resistance • Electromigration improvement; 100X longer lifetime (IBM, IEDM97) – Electromigration is a limiting factor beyond 0.18 mm if Al is used (HP, IEDM95) Vias Interconnect: # of Wiring Layers # of metal layers is steadily increasing due to: = 2.2 mW-cm M6 • Increasing die size and device count: we need more wires and longer wires to connect everything Tins • Rising need for a hierarchical wiring network; M5 W local wires with high density and global wires with low RC S M4 H 3.5 Minimum Widths (Relative) 4.0 3.5 3.0 M3 3.0 2.5 2.5 2.0 M2 1.5 M1 1.0 poly 0.5 0.25 mm wiring stack 0.0 Minimum Spacing (Relative) M5 M4 M3 M2 substrate M5 2.0 M4 M3 1.5 M1 1.0 Poly 0.5 M2 M1 Poly 0.0 m m m m m m m m m m Resistivity • Poly is much more resistive than metal (~10 ohm/sq) • Sometimes it is hard to avoid poly-silicon – Address line in a memory to keep the density Using Bypasses Driver WL Polysilicon word line Metal word line Driving a word line from both sides Factor 4 Metal bypass WL K cells Polysilicon word line Using a metal bypass (k/2)^2 For 1024 cells, every 16 cells 4000 times faster Diagonal Wiring destination diagonal y source x Manhattan • 20+% Interconnect length reduction • Clock speed Signal integrity Power integrity • 15+% Smaller chips plus 30+% via reduction At the expense of complexity, modeling, Mask making etc. Courtesy Cadence X-initiative Reducing RC-delay Per length resistance and capacitance Repeater Length (page 283 in chapter 6 of Rabaey) Repeater Insertion (Revisited) Taking the repeater loading into account For a given technology and a given interconnect layer, there exists an optimal length of the wire segments between repeaters. The delay of these wire segments is independent of the routing layer! Pipelining Fundamental problems in large designs: Long wires hence often exhibit a delay that is longer than the clock period of the design. For instance, the 10 cm long Al1 wire comes with a minimum delay of 3.9 nsec, even after optimal buffer insertion and sizing, while the 0.25 mm CMOS process featured in this text can sustain clock speeds in excess of 1 GHz 1nSec Architectural innovations is needed INTERCONNECT Inductance • Important source of parasitics: – Bond wires, chip packages • High current and long inductive wires • Change on the on-chip supply – Especially output pads • Reduced logic levels noise margin L di/dt Impact of inductance on supply voltages: • Change in current induces a change in voltage • Longer supply lines have larger L V DD 2.5 nH L i (t) V ’DD V out 1nS rise time V in 2.5V CL 10pF GND ’ L At first, PMOS delivers the Whole current in a few ten psec Huge voltage drop over L diL VL L 2.5nH 20 mA / 50 pSec 1V dt 2.5 2.5 2 2 1.5 1.5 out (V) L di/dt: Simulation 1 0.5 0.5 V 1 0 0 0 0.5 1 1.5 2 x 10 Without inductors With inductors 0.02 0 decoupled 0 0.5 1 1.5 2 x 10 1 0.5 1 1.5 2 x 10 -9 0.04 L i (A) 0.04 0 -9 0.02 0 0 0.5 1 1.5 -9 2 x 10 1 -9 0.5 0.5 L V (V) 0.95V 0 100mV 0 0 0.5 1 time (nsec) 1.5 2 x 10 -9 Input rise/fall time: 50 psec 0 0.5 1 time (nsec) 1.5 2 x 10 -9 Input rise/fall time: 800 psec Dealing with Ldi/dt • • Separate power pins for I/O pads and chip core. Multiple power and ground pins. – simultaneous switching of the 16 output drivers of an output bus would cause a voltage drop of at least 1.1 V if the supply connections of the buffers were connected to the same pin on the package – 5-10 drivers/pin • Careful selection of the positions of the power and ground pins on the package. – Corners are bad (next slide) • Increase the rise and fall times of the off-chip signals to the maximum extent allowable. – When noise is considered, the best driver is the one that achieves a specified delay with the maximum allowable rise and fall times at the output. – Short circuit current might be an issue! Choosing the Right Pin Bonding wire Chip L Mounting cavity L´ Lead frame Pin Dealing with Ldi/dt • Schedule current-consuming transitions. • Use advanced packaging technologies. – in flip-chip style on a substrate using the solder-bump techniques is reduced to 0.1nH, which is 50 to 100 times smaller • Add decoupling capacitances on the board, for every supply pin. – separate the bonding-wire inductance from the inductance of the board interconnect – Creates a low pass network – 200 pF was added between the supply connections of the buffer circuit examined in Example back by 3 slides Decoupling Capacitors 1 Board wiring Bonding wire Cd SUPPLY CHIP 2 Decoupling capacitor Decoupling capacitors are added: • on the board (right under the supply pins) • on the chip (under the supply straps, near large buffers) Dealing with Ldi/dt • Add decoupling capacitances on the chip. – To limit the voltage ripple to 0.25 V, a capacitance of around 12.5 nF must be provided for every 50 Kgate module in a 0.25 mm CMOS process – How to build them? Example • Compaq’s Alpha processors De-coupling Capacitor Ratios • EV4 – total effective switching capacitance = 12.5nF – 128nF of de-coupling capacitance – de-coupling/switching capacitance ~ 10x • EV5 – 13.9nF of switching capacitance – 160nF of de-coupling capacitance • EV6 – 34nF of effective switching capacitance – 320nF of de-coupling capacitance -- not enough! Source: B. Herrick (Compaq) EV6 De-coupling Capacitance Design for Idd= 25 A @ Vdd = 2.2 V, f = 600 MHz – 0.32-µF of on-chip de-coupling capacitance was added • Under major busses and around major gridded clock drivers • Occupies 15-20% of die area – 1-µF 2-cm2 Wirebond Attached Chip Capacitor (WACC) significantly increases “Near-Chip” decoupling • 160 Vdd/Vss bondwire pairs on the WACC minimize inductance Source: B. Herrick (Compaq) EV6 WACC 389 Signal - 198 VDD/VSS Pins 389 Signal Bondwires 395 VDD/VSS Bondwires 320 VDD/VSS Bondwires WACC Microprocessor Heat Slug 587 IPGA Source: B. Herrick (Compaq) Resonance • Ringing may create a dangerous VDD variation • Resonance frequency of – Fresonance=1/(2.pi.sqrt(L.Cdecoupling) – Used to be too high vs. clock – Cd increased, Fclock increased Fresonance is about or even less than Fclock – Controlled series resistance with the Cd IR drop Question • What if the rise/fall time is close to the time that it takes for the signal to travel from one side of a wire to the other side? – Eg. Long enough wires – Fast switches The Transmission Line l V in l r l r g c l r g c x g The Wave Equation c r V out g c Design Rules of Thumb • Transmission line effects should be considered when the rise or fall time of the input signal (tr, tf) is smaller than the time-of-flight of the transmission line (tflight). tr (tf) << 2.5 tflight • Transmission line effects should only be considered when the total resistance of the wire is limited: R < 5 Z0 • The transmission line is considered lossless when the total resistance is substantially smaller than the characteristic impedance, R < Z0/2 Should we be worried? • Transmission line effects cause overshooting and nonmonotonic behavior Clock signals in 400 MHz IBM Microprocessor (measured using e-beam prober) [Restle98] Solution: Matching • No harsh overshoots • Practically, predictable behavior – Faster operation – Design rule: To avoid the negative effects of transmission-line behaviour such as ringing or slow propagation delays, the line should be terminated, either at the source (series termination), or at the destination (parallel termination) with a resistance matched to its characteristic impedance Z0. – Good for many CMOS designs, where the destination load is purely capacitive Matched Termination Z0 Z0 ZL Series Source Termination ZS Z0 Parallel Destination Termination Z0 Matching • To drive a 50-ohm line requires a 53-umlong NFET and a 135-um-long PFET (in a 0.25 mm CMOS) to give a nominal output impedance of 50 ohm. • Process variation of 100%! Segmented Matched Line Driver In VDD Z0 s0 s1 c1 s2 c2 Binary sized gates ZL sn cn GND Parallel Termination─ Transistors as Resistors Sometimes a 50ohm grounded res. next to the input pin or on the chip DC current V dd Normalized On chip res. is not accurate Controlled MOS; PMOS looks More linear. Mr ( Out Vdd Mr V dd M rp M rn V bb Out Not practical NMOS only Resistance Out 1.7 1.6 1.5 1.4 1.3 1.2 1.1 1 0 PMOS only Sharp rise NMOS-PMOS PMOS with-1V bias 0.5 1 1.5 V R (Volt) 2 2.5 Output Driver with Varying Terminations Initial design: rise and fall times of 0.33nsec, load capacitance of 10pF. 60 mA ,120 um and 275 um for NMOS and PMOS V DD 4 V d 3 V V 2 in s 1 L = 2.5 nH 120 L = 2.5 nH V in 275 Vs Z 0 = 50 W C L= 5 pF Clamping Diodes 0 V DD 1 0 Vd 1 2 3 4 5 6 7 8 Initial design CL 4 3 V V d in L= 2.5 nH 2 Redesign: Sizes of the driver transistsors reduced impedances match the characteristic impedance of the transmission line (to 65 mm and 155 mm for NMOS and PMOS, respectively). A decoupling capacitance of 200 pF was added to the supplies of the drivers. V s 1 0 1 0 1 2 3 4 5 6 7 time (sec) Revised design with matched driver impedance 8 Reducing the swing tpHL = CL Vswing/2 Iav Reducing the swing potentially yields linear reduction in delay Also results in reduction in power dissipation Delay penalty is paid by the receiver Requires use of “sense amplifier” to restore signal level Frequently designed differentially (e.g. LVDS) Static vs. Dynamic (precharged), Single Ended vs. Differential Single-Ended Static Driver and Receiver • Just an inverter doesn’t work at the reciever – Small pull down current very slow – VDDL is too high to turn off PMOS • Static power • Diff. Cascode Voltage Switch Logic – The received diff2se VDD – No static power – Positive feedback acceleration In – Two supply driver – Still to slow VDD VDD VDD L Out VDD L CL receiver Out Reduced swing • Transmitter: inverse inverter PMOS in bottom – Limited swing Vtp -> VDD-Vtn – Init In2 @Vtp A @ Vtp, B @ GND – In2-> VDD-Vtn N3, P3 conduct A, B pulls up to VDD-Vtn – N2 on Out drops A pulls up to VDD P2 off • P1, N1 : level restorer weak. Reduced swing Diff. • Very good noise immunity – X-talk, supply • Signals as small as 200mV • Bulky, extra clock, two wires Dynamic Reduced Swing Network M2: large f VDD VDD M2 M4 Bus In1.f M1 In2.f Cbus M4, M3 skewed faster transition Out M3 Cout 2.5 Slow discharge 2.5 nSec advantage 18% lower energy V(Volt) V 2 V asym bus V 1.5 1 sym f Lower noise margin 0.5 Sensetive to parasitics 0 Sensetive to noise (x-talk, supply,…) 0 - Level restoring devices are typically used 2 4 6 time (ns) 8 10 12 Dynamic reduced swing Controlling the pw limited discharge time of CL REF VDD/2 SA static power EN2 is needed Floating CL noise immunity , PW Trade-offs: Power, speed VS noise immunity Current mode transmission In voltage mode, signal swing is bounded by noise consideration on supply and threshld current-mode approach holds a definite edge in terms of (dynamic) power dissipation. Because of operation under lower NM (swing ~100mV is good) A differential amplifier is used to the detect the voltage changes over RT. Observe that the signal and its return path are both isolated from the supply rails and the associated noise, making all supply-noise common-mode to the differential receiver. The “Network-on-a-Chip” • Communication problem • Right protocols between blocks – QoS is needed • Let errors happen, we fix them later on a separate block – Better energy, performance, etc. • Shared channels, routers serve ‘packets of data on a chip’ • Dynamic routing of data to the right block The “Network-on-a-Chip” Embedded Processors Memory Sub-system Interconnect Backplane Accelators Configurable Accelerators Peripherals