Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Physical Effects: Delay RP RW Cd CW/2 CW/2 Cg 6.375 Complex Digital Systems Krste Asanovic March 5, 2007 6.375 Standard Cell Design Flow Bluespec SystemVerilog source Bluespec Compiler Blueview Verilog 95 RTL C Bluespec C sim Cycle Accurate Verilog sim VCD output Legend files Bluespec tools 3rd party tools RTL synthesis gates Debussy Visualization How do RTL choices affect resulting physical design? 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 2 Measuring Chip “Quality” Most important metrics for a chip design: • Area – Size affects manufacturing and packaging costs • Performance – Does chip meet market performance goals? • Power – Peak power affects packaging cost (current supply, heat removal) – Energy usage affects battery life 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 3 Iron Law of Performance Operations Clock Cycles Performance Clock Cycle Second Concurrency in RTL Design Clock Frequency of Physical Design These are not independent parameters! Clock frequency set by delay of circuit components in critical path 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 4 Basic CMOS Components Gates Transistors Wires output input0 input1 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 5 Metal Oxide-Semiconductor Field-Effect (MOSFET) Transistor Gate Inversion happens here Source diffusion Drain diffusion Eh Ev bulk INVERSION: A sufficiently strong vertical field will attract enough electrons to the surface to create a conducting n-type channel between the source and drain. CONDUCTION: If a channel exists, a horizontal field will cause a drift current from the drain to the source. 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 6 Key qualitative characteristics of MOSFET transistors affecting delay Width Vout Vin Cgate Cdrain Reff Length • Increase Width (W) Increase current Decrease Reff • Increase Length (L) Decrease current Increase Reff • Cgate proportional to (W x L) and Cdrain proportional to W 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 7 CMOS Transistors, Gates, and Wires Gates Transistors Wires output input0 input1 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 8 The most basic CMOS gate is an inverter Let’s make the following assumptions WP/LP Vin 2α Vout WN/LN 1α 1. All transistors are minimum length 2. All gates should have equal rise/fall times. Since PMOS are ~twice as slow as NMOS they must be twice as wide to have the same effective resistance 3. Normalize all transistor widths to minimum width NMOS 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 9 The most basic CMOS gate is an inverter VDD WP/LP 2α PMOS Vin Vout WN/LN 1α A Y NMOS GND 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 10 A simple RC model for the inverter can provide significant insight Reff Vin Vout Vin Vout Cg Cd Reff Reff = Reff,N = Reff,P Cg = Cg,N + Cg,P Cd = Cd,N + Cd,P 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 11 A simple RC model for the inverter can provide significant insight Reff Vout Vin Cg Cd CL Reff 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 12 A simple RC model for the inverter can provide significant insight Reff Vout Vin = “0” Cg Cd CL Reff Charge RC Time Constant (TPLH) = Reff x ( Cd + CL ) 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 13 The most basic CMOS gate is an inverter Reff Vout Vin = “1” Cg Cd CL Reff Discharge RC Time Constant (TPHL) = Reff x ( Cd + CL ) 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 14 Larger gates are faster since they decrease Reff (but they also increase Cd!) Process gen = 0.25μm Supply voltage = 5V Min width NMOS = 0.5μm 2 2 1 1 Param Value Units Cd,N/μm 1.42 fF/μm Cd,P/μm 2.40 fF/μm Cg,N/μm 1.55 fF/μm Cg,P/μm 1.48 fF/μm Reff,N x μm 4.93 kΩ/μm Cd = (0.5x1.42) + (1x2.40) = 3.11 fF CL = (0.5x1.55) + (1x1.48) = 2.26 fF Cd+CL = 5.37 fF TPLH = 2.2 x (10.83/1) x 5.37 = 128ps TPHL = 2.2 x (4.93/0.5) x 5.37 = 116ps Double size of driver 4 2 2 1 Reff,P x μm 10.83 kΩ/μm Ignores the fact that previous gate now must drive a bigger gate capacitance! Cd = (1x1.42) + (2x2.40) = 3.66 fF CL = (0.5x1.55) + (1x1.48) = 2.26 fF Cd+CL = 5.92 fF TPLH = 2.2 x (10.83/2) x 5.92 = 70.5ps TPHL = 2.2 x (4.93/1) x 5.92 = 64.2ps 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 15 More complicated gates use more transistors in pullup/pulldown networks VDD Pullup network, connects output to VDD, contains only PMOS Input 0 Input 1 Input N VOUT Pulldown network, connects output to GND, contains only NMOS For every set of input logic values, either pullup or pulldown network makes connection to VDD or GND – If both connected, power rails would be shorted together – If neither connected, output would float (tristate logic) 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 16 Series and parallel MOSFET networks provide natural duals of each other A A A B B Conducts if A=0 Conducts if A=0 OR B=0 Conducts if A=0 AND B=0 A A A B B Conducts if A=1 Conducts if A=1 AND B=1 Conducts if A=1 OR B=1 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 17 NAND and NOR gates illustrate the dual nature of the pullup/pulldown networks NAND Gate A B NOR Gate A B (A.B) (A+B) A (A.B) B B (A+B) A 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 18 A methodical approach to build more complex gates • Goal is to create an logic function f ( x1, x 2 , ) – We can only implement inverting logic with one CMOS stage • Implement pulldown network – Write PD f ( x 1, x 2 , ) – Use parallel NMOS for OR of inputs – Use series NMOS for AND of inputs • Implement pullup network – Write pullup network PU f ( x1, x 2 , ) g(x1, x 2 , ) – Use parallel PMOS for OR of complemented inputs – Use series PMOS for AND of complemented inputs 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 19 Example of complex gate A f ( A B) C PD ( A B) C B (A+B).C C PU ( A B) C ( A B) C ( A B) C • Should we map every function into a single complex gate? • What gates should we put into a standard cell library? 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 20 Examples illustrating unit-less delay (d) of gates with equal drive strength (Reff) 4 8 4 4 2 10 4 10 2 4 Inverter delay = 2.67 8 NAND delay = 3.67 2 10 NOR delay = 3.67 Less parasitic drain capacitance (Cd) loading output 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 21 Examples illustrating unit-less delay (d) of gates with similar area 2.5 4 2.5 6 3 10 2.5 10 1 2.5 Inverter delay = 2.11 4 NAND delay = 4.67 1 10 NOR delay = 5.33 PMOS worse than NMOS, series path is limiter 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 22 Which gate topology and transistor sizing is optimal? Given a logic function, there are many possible logic gate topologies and transistor sizings. 1. What is the optimal transistor sizing? 2. What is the optimal number of logic stages? 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 23 Optimal sizing and delays for example topologies Topology B Topology A 4/3 2 4/3 2 4/3 2 4/3 2 2 4 Topology C 10/3 8 2 4 1 5/3 2 5/3 2 4/3 2 1 5/3 2 G N P DOPT Optimal delay for output loading H H=1 H=12 A 2.96 4 7 4(2.96H)1/4 + 7 12.25 16.77 B 3.33 2 6 2(3.33H)1/2 + 6 C 3.33 2 9 2(3.33H)1/2 + 9 12.65 21.64 9.65 18.64 [ For more explanation of how these numbers were derived, see Logical Effort link on website ] 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 24 CMOS Transistors, Gates, and Wires Gates Transistors Wires output input0 input1 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 25 Wires are an old problem Cray-1 1976 Cray-3 wiring Cray-3 1993 Cray-1 Wiring 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 26 Modern interconnect stacks have six to nine or more metal layers Metal 6 © IBM Via 5-6 Metal 5 Metal 4 Metal 3 Metal 2 Metal 1 Via 1-2 IBM CMOS7 process © IBM 6 layers of copper wiring 1 layer of tungsten local interconnect 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 27 Wire resistance is a function of height, width, and length resistance Height Length Width length resistivity height width bulk aluminum 2.8x10-8 -m bulk copper 1.7x10-8 -m bulk silver 1.6x10-8 -m • Height (Thickness) fixed in given manufacturing process • Resistances quoted as /square • TSMC 0.18µm 6 Aluminum metal layers – M1-5 0.08 /square (0.5 µm x 1mm wire = 160 ) – M6 0.03 /square (0.5 µm x 1mm wire = 60 ) 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 28 Wire capacitance is relative to the substrate and to neighboring wires H2 W2 D12 H1 W 1 S1 DD1 • Capacitance depends on geometry of surrounding wires and relative permittivity (r) of insulating dielectric – silicon dioxide (SiO2) – silicon flouride (SiF4) – SiLKTM polymer r = 3.9 r = 3.1 r = 2.6 Capacitive coupling to neighbors is becoming a serious problem! 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 29 Wire capacitance is relative to the substrate and to neighboring wires H2 2 12 W2 D12 H1 1 D1 W 1 S1 DD1 • Capacitance depends on geometry of surrounding wires and relative permittivity (r) of insulating dielectric – silicon dioxide (SiO2) – silicon flouride (SiF4) – SiLKTM polymer r = 3.9 r = 3.1 r = 2.6 • Can have different materials between wires and between layers, and also different materials on higher layers 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 30 This IBM experimental 130nm process includes two metals and two dielectrics Al E. Barth, IBM Microelectronics 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 31 Distributed RC wire model gives accurate results but is computationally expensive Rdriver R1 R2 RN Cload C1 C2 CN Use Penfield-Rubenstein equation to find delay ji Delay R j Ci i j 1 N How does the delay scale with longer wires? – Wire delay increases quadratically – Edge rate also degrades quadratically 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 32 Lumped model can provide a quick reasonable approximation Rdriver Rw Cload Cw/2 Delay R driv er Cw/2 Cw Cw R driv er R w Cload 2 2 Rw is lumped resistance of the wire Cw is lumped capacitance Partition half of Cw at each end 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 33 Estimate the rise time of node A using an RC delay model Process gen = 0.25μm Supply voltage = 5V Min width NMOS = 0.5μm Param Value Units Cd,N / μm 1.42 fF/μm Cd,P / μm 2.40 fF/μm Cg,N / μm 1.55 fF/μm Cg,P / μm 1.48 fF/μm CA,M2 / μm2 0.016 fF/μm2 Metal 2 wire (250µm x 0.250µm) 16 8 2 1 A RP RW Cd CW/2 CW/2 Cg CL,M2 / μm 0.084 fF/μm Reff,N x μm 4.93 kΩ/μm Reff,P x μm 10.83 kΩ/μm RM2 / sq 0.07 Ω/sq Cg = ( 0.5 x 1.55 ) + ( 1 x 1.48 ) = 2.26 fF Cd = (4 x 1.42 ) + ( 8 x 2.40 ) = 24.88 fF Rp = 10.83/8 = 1.35 kΩ Rw = ( 250 / 0.25 ) x 0.07 = 70 Ω Cw = (( 250 x 0.25 ) x 0.0016 ) + ( 250 x 0.084 ) = 21.14 fF TPLH = 2.2 x ( 1350 x (21.14/2 + 24.88) + (1350 + 70) x (21.14/2 + 2.26) ) = 66ps 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 34 Estimate the rise time of node A using an RC delay model Process gen = 0.25μm Supply voltage = 5V Min width NMOS = 0.5μm Param Value Units Cd,N / μm 1.42 fF/μm Cd,P / μm 2.40 fF/μm Cg,N / μm 1.55 fF/μm Cg,P / μm 1.48 fF/μm CA,M2 / μm2 0.016 fF/μm2 CL,M2 / μm 0.084 fF/μm Reff,N x μm 4.93 kΩ/μm Reff,P x μm 10.83 kΩ/μm RM2 / sq 0.07 Metal 2 wire (250u x 0.250u) 16 8 2 1 A How should we buffer up this signal? Should we have a few big stages or many small stages? 2 8 16 2 6 10 14 16 1 2 8 1 3 5 7 8 Ω/sq 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 35 How many stages of inverters required if want to drive large load? Cin … Cout 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 36 A good rule-of-thumb is to target a stage effort around four Cin Cout Minimum delay when: – Stage effort = logical effort x electrical effort ≈ 3.4-3.8 – Some derivations use e = 2.718.. – this ignores parasitics – Broad optimum, stage efforts of 2.4-6.0 within 15-20% of minimum Fan-out-of-four (FO4) is convenient design size (~5t FO4 delay: Delay of inverter driving four copies of itself 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 37 Large RC makes long wires slow Rdriver R1 R2 RN Cload C1 C2 CN ji Delay R j C i i j 1 N Wire delay increases quadratically 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 38 Adding repeaters gives linear growth in delay Rdriver R1 R2 RN Cload C1 C2 CN N Delay R i Ci i 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 39 Several issues with repeater insertion • • • • Repeater must connect to transistor layers Blocks other routes with vias that connect down Requires space on active layers for buffer transistors Repeaters often grouped in preallocated repeater boxes spread around chip, and thus repeater location might not give ideal spacing 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 40 Wire delay in standard-cell flow • Front-end tools include approximate wire-load models – Usually statistical in nature, quite inaccurate – Helps synthesis tool with technology mapping • Back-end tools include better wire-load models – After trial placement can use Manhattan distance – Tool will automatically insert repeaters where necessary – Note: Tools cannot add extra pipeline stages if wires are too long -> cycle time will suffer if you have global combinational logic paths 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 41 Wire Delay: Impact on RTL • Need extra pipeline stages for wire delay – Pentium-4 included stages just for driving signals – Requires very early physical prototyping – RTL changes if communication latency changes! • Use latency-insensitive methodology to avoid reworking RTL design at late stage – Create macroblocks with FIFO interfaces – Use rules that don’t depend on number of cycles to propagate data through FIFOs – Can change effective latency through FIFO after physical layout without changing RTL design in macroblocks 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 TC Next IP TC Fetch Drive Alloc Rename Queue Schedule 1 Schedule 2 Schedule 3 Dispatch 1 Dispatch 2 Register File 1 Register File 2 Execute Flags Branch Check Drive 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 42 In deep submicron technologies many predicted an interconnect doomsday National Technology Roadmap for Semiconductors, SIA, 1997 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 43 Is there really an interconnect doomsday looming? Local wire delay tracks improvement in gate delay Scaling Impact Affect on Affect on Resistance Capacitance Length Decreases Decreases Decrease Width Decreases Increases Decrease Height ~ Constant -- -- R. Ho, K. Mai, M. Horowitz, Proc. of the IEEE, Apr 2001 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 44 Is there really an interconnect doomsday looming? Scaling Impact Affect on Affect on Resistance Capacitance Length ~ Constant -- -- Width Decreases Increases Decrease Height ~ Constant -- -- R. Ho, K. Mai, M. Horowitz, Proc. of the IEEE, Apr 2001 Global wire delay increases relative to wire delay! 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 45 No doomsday, just one more physical design issue to carefully manage National Technology Roadmap for Semiconductors, SIA, 2005 6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 46