Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Interconnect Centric VLSI Design Automation Chung-Kuan Cheng CSE Department UC San Diego La Jolla, CA 92093-0404 [email protected] 1 Interconnect Dominance 200 1mm Global Interconnect with Scattering (source: ITRS Roadmap 2004) Delay (ps) 160 120 FO4 Inverter Delay (Estimated by 0.36*Ldraw) 1mm distortionless Transmission Line (Speed of Light) 80 40 0 180 150 130 100 90 80 70 65 57 50 Process Technology Node (nm) Goals: Speed, Power, Cost Constraints: Area, Current, Skew Challenges: PVT Variations, Signal Integrity 2 Outlines • Interconnect Technologies • Buffers, Pitches, Circuit Styles • Geometric Planning • Wire Orientations, Chip Shapes • Interconnect Networks • Topologies, Wire Styles • Power and Ground Distributions • Clock Networks • Functional Modules • Adders, Shifters • Packaging • Conclusion 3 Interconnect Technologies • RC Wires • Wire Pitch, Width, Separation • Buffer Size, Buffer Interval • Transmission Line • RLCG 4 Interconnect Technologies Year (On-Chip) rncn (ps)T40 rwcw (ps/mm)T80 metal 1/global interval (mm) delay (ps/mm) Int 2(1 f )rncg rwcw , 2005 SRC Roadmap 2005 2010 2015 0.870 0.400 0.180 440 110 1792 523 5951 1601 0.136 0.272 120 60 0.0457 0.0846 164 89 0.0168 0.0324 200 104 Delay (ltr ) / ltr (2 2(1 f ) ) rn rwcg cw 5 Interconnect Tech.: Transmission Line • Speed-of-the-light on-chip communication • < 1/5 Delay of Traditional Wires • Low Power Consumption • < 1/5 Power Consumption • Robust against process variations • Short Latency • Insensitive to Feature Size 6 Interconnect Tech.: Transmission Line Differential Transmission Line i(z,t) i(z,t) RΔl RΔl Surfliner LΔl LΔl RΔl CΔl LΔl RΔl LΔl CΔl Serial resistance causes voltage loss. Speed and attenuation are frequency dependent. … RΔl RΔl LΔl GΔl LΔl RΔl CΔl RΔl LΔl GΔl CΔl … LΔl Shunt conductance compensates voltage loss: R/G = L/C. Flat from DC Mode to Giga Hz Telegraph Cable: O. Heaviside in 1887. 7 Theory (Telegrapher’s Equation) • Telegrapher’s equation: dV ( z , t ) dI ( z , t ) RI ( z , t ) L dz dt dI ( z , t ) dV ( z , t ) C GV ( z , t ) dz dt • Propagation Constant: ( R jL)(G jC) j • Wave Propagation: V ( z ) V0 e z j z • Alpha and Beta corresponds to speed and phase velocity. Both are frequency dependant 8 Theory (Distortionless Line) • Set G=RC/L • Frequency Independent speed and attenuation: R/ L/C, LC • Characteristic impedance: (pure resistive) Z0 L / C • Phase Velocity (Speed of light in the media) v 1 / LC c • Attenuation: A( z ) e R z Z0 9 Digital Signal Response 10 Interconnect Tech.: Transmission Line • Add shunt conductance between differential wires • Resistors realized by serpentine unsilicided poly, diffusion resistors, or high resistive metal 11 Geometrical Planning • Wire Orientations • Manhattan, Hexagonal, Octagonal, Euclidean • Die Shapes • Rectangle, Diamond, Hexagon, Octagon, Circle 12 Average Radius of Unit-Circle Area lambda geo. Man. Y-Arch X-Arch Euclid. Shape Square 1.329 1.122 1.070 1.017 Diamond 1.253 1.121 1.070 1.017 Hexagon 1.276 1.100 1.058 1.003 Octagon 1.272 1.104 1.054 1.001 Circle 1.273 1.103 1.055 1.000 13 Throughput : concurrent flow demand lambda geo. Man. Shape M: Square 1.000 M: Diamond Y: Hexagon X: Octagon* Y-Arch 1.225 X-Arch* 1.346 1.195 1.315 1.420 *ratio of 0-90 planes and 45-135 planes is not fixed 14 Flow congestion map for uniform 90 Degree meshes 15 Congestion map of square chip using X-architecture 12 by 12 13 by 13 16 Y-architecture + Square Chip 12 by 12 13 by 13 17 Y Architecture + Hexagonal Chip 18 X-Architecture + Octagonal Chip 19 Manhattan Architecture + Diamond Chip 20 Routing Grids X-Architecture Y-Architecture (http://www.xinitiative.org/img/062102forum.pdf) 21 Interconnect Networks • Optimized Interconnect Architecture • Data Bus, Control Signals • Shared Interconnect • Packet Switching • Circuit Switching • RTL Level Partition 22 Interconnect Networks On-chip transmission line for long distance communication Physical Implementation Well spaced RC wire with buffer insertion for local connections 23 Interconnect Networks • Obj: Power, Latency • Constraints: • Routing Area, Bandwidth • Design Space: • Topology • Wire Styles, Switches • Model: • Traffic Demand • Data Bus, Control Signals 24 Interconnect Networks: Design Flow Multi-commodity network flow (MCF) formulation Power & Delay Lib Topology Lib Power Evaluation (MCF solver) Latency Aware low power NoC topology with wire style optimization 25 Power and Latency Tradeoffs for Optimal 8x8 Topologies Power Consumption (W) area=3000um area=6000um area=9000um area=4000um area=7000um area=10000um area=5000um area=8000um area=11000um 73 68 63 58 53 48 2.3 2.5 2.7 2.9 Average Latency (ns) 3.1 3.3 26 Topology Selection (Latency, Power, BW) Power Consumption (W) topo=optimal topo=mesh topo=torus topo=hypercube 98 88 78 68 58 48 2.3 2.8 3.3 3.8 Average Latency (ns) 4.3 27 Topology Selection (Latency, Power, BW) (a) Optimal topology when area = 3000um (b) Optimal topology when area = 7000um (c) Optimal topology when area = 11000um Optimal 8-node topologies vs area resources 28 Power Ground Networks • Power network on chip and package • Decoupling capacitor • Conducting transistors connected to the P/G grids 29 Power Ground Analysis • Constraints: Voltage Drops and Current Density IR Drop: Static Analysis • • • • Resistive Networks Static Current Sources dI/dt: Dynamic Analysis • • • • • RLC Networks Power on and off Sleep mode on and off Gated clocks Various operation modes 30 Power/Ground Networks: Natural Frequency • RLC Network Characteristics: Natural frequencies (Quality Factor) • Operation Modes: Excite the resonance • Decoupling Capacitance: Shift the natural frequencies. 31 H. Chen, IBM 32 Clock Distributions 33 Clock: Linear Variations Model • Process variation model • Transistor length • Wire width • Linear variation model d d0 k x x k y y • Power variation model • Supply voltage varies randomly (10%) 34 Clock: RC Model Input: an n level meshes and h-trees Constraint: routing area, parameter variations Objective: skew 35 Simplified Circuit Model Vs1 1 Rs u(t) C R Vs2 u(t-T) Rs 2 C 36 Skew Expression Assumptions: 1. T<<RsC t 0 ln 2 R1C1 2. Rs /R <<RsC/T Using first order Taylor expansion 1 V V V V T ( 1 . 2 1 . 2 ) 2 V V 1 2 ex=1+x, Skew function : Rs T T exp( 2 ln 2 ) R 37 Optimal Routing Resources Allocation total area level-4 level-3 level-2 level-1 38 Inductance Diminishes Shunt Effects Vs1 Rs 1 u(t) C L R Vs2 • 0.5um wide 1.2 cm long copper wire • Input skew 20ps Rs 2 u(t-T) C f(GHz) 0.5 skew(ps) 3.9 1 1.5 2 3 3.5 4 5 4.2 5.8 7.5 9.9 13 17 26 39 Clock Distributions Surfliner 40 Functional Modules (Data Path) • • • • • Arithmetic Algorithms Logic Logic Styles Placement 41 Cyclic Shifter Delay: nlogn, Power: ¼ n2 D6 D7 (7,0) S0 S1 S2 0 (6,0) 1 (7,1) 0 1 (7,2) 0 1 1 (6,1) 1 (6,2) 0 1 0 1 D3 D4 (4,0) (5,0) 0 0 D5 (3,0) 0 1 0 D2 (2,0) 1 0 (1,0) 1 (5,1) (4,1) (3,1) (2,1) 0 0 0 0 1 (5,2) 0 1 (7,3) (6,3) (5,3) Z7 Z6 Z5 1 (4,2) 0 1 1 (3,2) 0 1 1 (2,2) 0 1 D0 D1 0 (0,0) 1 (1,1) 0 1 (1,2) 0 1 0 1 (0,1) 0 1 (0,2) 0 1 (4,3) (3,3) (2,3) (1,3) (0,3) Z4 Z3 Z2 Z1 Z0 42 Fan-out Splitting D7 D6 S0 (7,0) (6,0) S1 (7,1) S2 (7,2) 0 0 1 0 (6,1) 0 0 0 (5,3) 0 τ 0 0 0 (3,3) 0 0 0 0 0 Z1 Cg 6000 484 Impr. Ratio 8-bit 10.7% 16-bit 26.0% 32-bit 41.5% 64-bit 53.7% 224 218.7 200 128 102.7 100 76 50.7 45.3 Total Switched Capacitance Delay 300 1 Z0 Delay Comparison MuxShifter DemuxShifter 400 1 (0,3) 600 500 Delay: n Power: 3/16 n2 (0,2) 1 (1,3) 1 (0,1) 1 (1,2) 1 Z2 (0,0) 1 (1,1) 1 (2,3) Z3 Z4 0 (2,2) 0 1 D0 (1,0) 1 (2,1) 1 (3,2) 1 D1 (2,0) 0 1 (3,1) 1 (4,3) Z5 Z6 Z7 0 (4,2) 1 D2 (3,0) 1 (4,1) 0 1 (5,2) 1 (6,3) (4,0) 0 1 (5,1) 1 (6,2) 1 (7,3) 0 D3 D4 (5,0) 1 0 1 0 D5 Power Comparison MuxShifter DemuxShifter 5000 4000 3000 5220 4458.9 Impr. Ratio 8-bit 2.5% 16-bit 6.3% 32-bit 10.5% 64-bit 14.6% 2000 1561.3 1397.6 1000 484 453.7 150.7 146.9 0 0 8-bit 16-bit 32-bit 64-bit 8-bit 16-bit 32-bit 64-bit 43 Cell Permutation • Fix the input/output stage, permute cells of intermediate stages to further improve delay. • Formulate as an ILP problem and solve by CPLEX. Optimal Timing solution for 8-bit 7 6 5 4 3 2 1 0 0 0 0 0 0 0 0 0 >> 1-bit 1 0 1 0 1 0 1 0 4 3 0 1 0 1 0 1 6 5 4 3 7 2 1 0 1 1 1 1 4 1 1 1 >> 2-bit 4 2 2 2 4 1 4 5 4 3 5 5 1 4 1 3 3 4 2 6 7 5 1 0 4 2 4 5 4 5 4 3 >> 4-bit 8 4 7 5 8 8 4 7 8 4 7 7 4 6 3 8 7 6 5 4 3 2 1 0 8 7 8 7 8 7 6 Delay (in unit of columns spanned) • Optimal solution in terms of delay • Delay/Power tradeoff Additional Delay Reduction by Cell Permutation 140 120 125 w/o permuatation w/ permutation 100 78 80 61 60 38 40 20 29 13 18 8 0 8-bit 16-bit 32-bit 64-bit 8 44 Packaging: Pin Breakaway • Row by Row Escape: Escape interconnect row by row from outside toward inside. 45 Packaging: Escape Sequence Strategies • Parallel triangular sequence: This method divides the objects into groups and escape each group with a triangular outline. 46 Packaging: Escape Sequence Strategies • Central triangular sequence: Escape objects from the center of the outside row and expand the indent with a single triangular outline. In this method, the outline capacity is increased continuously layer by layer while the first several layers is small. 47 Packaging: Escape Sequence Strategies • Two-sided sequence: Escape objects from the inside as well as from the outside. The outline shrinks slowly and also follows zigzag shape. 48 Packaing: Escape Sequence Strategies 49 Packaging: Experimental Results 40 x 40 304 272 Parallel triangular 276 340 Central triangular 100 156 3 4 240 208 292 240 228 300 308 324 5 6 7 8 9 10 176 144 112 80 48 16 164 124 164 372 444 328 Layer Row by row 1 2 Two sided 312 328 50 V.3. Experimental Results Layer Parallel Row by row triangular Central triangular 20 x 20 Two sided 1 2 144 112 132 144 92 116 140 160 3 80 96 140 100 4 5 48 16 28 52 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 Conclusion • • • • • • • Interconnect Technologies Geometric Planning Interconnect Planning Power and Ground Distribution Clock Networks Data Path Packaging 68 Interconnect Technologies Coupling capacitance: t h h t t 2.5 s 0.31s 0.08 s 1.3 s 1 1.5e e 1.5e 0.13e s Cca w 0.65s h h 0.35 h 1.53 0.98e 0.01 e s 0.2 Ccfr Self capacitance: t s s s 1.2 h 1.05 0.63e e s 2h Csfr Csa 0.05 t h 0.25 0.063 w h Total wire capacitance: Cca Ccfr Csa Csfr Cc s t Cw h Cs w d Cs h w: wire width s: wire spacing t: wire thickness h: distance to p/g plane Cs: self capacitance Cc: coupling capacitance p/g plane wire d: wire pitch 69 Interconnect Technologies Design metrics: delayn Objective functions: delayn bandwidth delayn powern bandwidth / power delayn2 powern For each pitch, For each objective function Find wire width w, buffer size sinv and buffer interval linv Calculate the metrics 70 Experimental Results – normalized delay -7 1.8 x 10 x 10 min-d min-ddp min-dp 180nm 130nm 100nm 70nm -7 1.6 2 1.4 1.2 n 1 1 0.8 0.5 0.6 0 0 0.4 0.5 1 -6 x 10 pit ch (m 1.5 ) 200 150 100 0.2 1 50 2 3 4 5 6 pitch(m) 7 8 9 10 -7 x 10 ode techn -7 2 x 10 min-d min-ddp min-dp 1.8 Bottom right: at 70nm technology 1.6 n Top left: Overview Top right: at different pitches (solid lines: min-pitch; dash lines: saturating pitch) delay (s/m) n delay (s/m) 1.5 delay (s/m) min-d min-ddp min-dp 1.4 1.2 1 0.8 0.6 0.4 0 0.2 0.4 0.6 pitch(m) 0.8 1 1.2 -6 x 10 71 Interconnect Tech. -- bandwidth 13 5 x 10 min-d min-ddp min-dp 180nm 130nm 100nm 70nm 13 x 10 4.5 4 bandwidth(bits/s) min-d min-ddp min-dp 3 2 4 3.5 3 1 0 0.5 -6 x 10 1 pitch (m ) 2.5 1.5 50 2 2.5 100 150 1.5 200 3 pitch(m) 3.5 4 4.5 -7 x 10 e nod tech 13 5 x 10 min-d min-ddp min-dp 4.5 Top left: Overview Top right: at min-pitches Bottom right: at 70nm technology 4 bandwidth(bits/s) bandwidth(bits/s) 5 3.5 3 2.5 2 1.5 1 0 0.2 0.4 0.6 pitch(m) 0.8 1 1.2 -6 x 10 72 Interconnect Tech. – bandwidth/power 23 5 x 10 min-d min-ddp min-dp 180nm 130nm 100nm 70nm 4.5 23 min-d min-ddp min-dp 4 3 2 4 3.5 3 2.5 2 1.5 1 1 0.5 1 0 0 2 3 4 50 0.5 x 10 -6 1 pitch (m ) 1.5 200 100 150 node h tec 5 pitch(m) 6 7 8 x 10 min-d min-ddp min-dp 4.5 Top left: Overview Top right: at different pitches (solid lines: min-pitch; dash lines: optimal pitch) Bottom right: at 70nm technology 9 -7 x 10 23 5 bandwidth/power(bits*m/Js) bandwidth/power(m/Js) 5 bandwidth/power(bits*m/Js) x 10 4 3.5 3 2.5 2 1.5 1 0.5 0 0.2 0.4 0.6 pitch(m) 0.8 1 1.2 -6 x 10 73 Interconnect Technologies Example: w= 85nm, t= 145nm rn= 10Kohm,cn=0.25fF,cg=2.34xcn=0.585fF rw=2ohm/um, cw=0.2fF/um Optimal interval l 2(1 f )rn cg rwcw 242m Optimal buffer size Optimal delay rn cw s 41 rwc g Delay (ltr ) / ltr (2 2(1 f ) ) rn rwcg cw 194 fs / m 194 ps / mm 74 Experiments—Optimized Skew total area 0.00 0.25 0.40 1.00 3.00 5.00 s-mesh(s) 2.92E-11 2.79E-11 2.71E-11 2.42E-11 1.70E-11 1.24E-11 skew m-mesh(s) ratio 2.92E-11 100.0% 2.60E-11 93.2% 2.45E-11 90.4% 1.98E-11 81.8% 1.24E-11 73.2% 8.72E-12 70.5% 75 Robustness Against Supply Voltage Variations mutli-level mesh total area ave worst 0.00 2.10E-11 2.91E-11 1.00 8.38E-12 1.14E-11 2.00 2.71E-12 4.42E-12 3.00 1.89E-12 3.33E-12 4.00 1.45E-12 2.48E-12 5.00 1.16E-12 2.02E-12 single-level mesh ave worst 2.10E-11 2.91E-11 8.26E-12 1.43E-11 6.18E-12 1.11E-11 4.83E-12 8.73E-12 3.88E-12 6.96E-12 3.18E-12 5.64E-12 76