Download Physical Implementation - Inst.eecs.berkeley.edu

CS250 VLSI Systems Design Fall 2009 John Wawrzynek, Krste Asanovic’, with John Lazzaro Physical Implementation Lecture 08, Layout 1 CS250, UC Berkeley Fall ‘09 Outline Standard cell “back-end” place and route tools make layout mostly automatic. However, you should understand some of the more difficult concerns for physical implementation: ‣ Power distribution ‣ Chip input/output ‣ Clock distribution ‣ Floor-planning ‣ Not: Place & Route Lecture 08, Layout 2 CS250, UC Berkeley Fall ‘09 Routing PWR and GND ‣ Main Design Concerns: 1. IR Drops. Remembering Ohms Law V = IR. Voltage drops due to resistance in power/GND wires slower circuits, false switching 2. Metal Migration (electromigration). Accidental fusing chip failure 3. Inductive Effects. Bounce and oscillation on power nodes Lecture 08, Layout 1 false switching 3 CS250, UC Berkeley Fall ‘09 Metal Migration ‣ A current flux through a metal conductor exceeding a certain limit causes metal atoms to move in the direction of the current. Defect (or design) ‣ If there is a small constriction ... ‣ the current density will be higher and more metal atoms will be carried away, creating a higher current density, ..., open circuit like a fuse ‣ Metal migration begins at ∼2x105 A/cm2 ‣ Design Guideline: Size width of both Vdd and GND wires so that no more than ∼0.5mA/um in width (aluminum). Lecture 08, Layout 1 4 CS250, UC Berkeley Fall ‘09 Other Migration Related Failures Lecture 08, Layout 1 5 CS250, UC Berkeley Fall ‘09 IR Drops Lecture 01, Introduction 1 6 CS250, UC Berkeley Fall ‘09 IR Drops ‣ Switching induced ‣ Lecture 08, Layout 1 7 noise on pwr and GND rails can couple through to logic signals and cause accidental switching. Particularly troublesome around circuit with low noise immunity. Ex: RAM block sense amplifiers. CS250, UC Berkeley Fall ‘09 Layout Strategy for Power Rails (power and ground wires) 1. Keep distance from source of power as short as possible. 2. Use wide/thick metal. 3. Isolate “noisy” sections. 4. Use multiple sources. Lecture 08, Layout 1 8 CS250, UC Berkeley Fall ‘09 Layout Strategy for Power Rails (power and ground wires) ‣ With many layers of metal, upper few layers are typically ‣ ‣ thicker than lower layers. Upper layers get allocated for global power rails. Via connections to lower layers for local distribution. Lecture 08, Layout 1 9 CS250, UC Berkeley Fall ‘09 Inductive Bounce Inductance in the power and ground paths results in voltage glitches (noise) on Vdd and GND nodes. On-chip L values of wires is small => usually not significant except: 1. Very large currents: clock drivers, off-chip drivers 2. Package pins, bonding wires (1nH/mm) Package pins can have from 2-40nH of inductance, depending on package type Strategy: 1. Use multiple bonding pads (wires) from Vdd & GND 2. If necessary, use on-chip bypass capacitors Lecture 01, Introduction 1 10 CS250, UC Berkeley Fall ‘09 On-chip Bypass Capacitors ‣ When transistors switch, current is drawn from CD rather than through package pins and bonding wires - smoothes out dI/dt. (Think of it as a “charge cache”). ‣ Gate oxide leads to highest capacitance, but poor yield. Thicker oxides are used. Lecture 08, Layout 1 11 CS250, UC Berkeley Fall ‘09 On-chip Bypass Capacitance ‣ Distributed bypass capacitors also smooth out noise from IR drops by locally supplying charge when needed. ‣ Generally, capacitance on Vdd and GND is a good thing. (Fat wires help, larger transistor source regions help, extra capacitors when needed) ‣ Capacitive bypassing continues all the way up the packaging chain. Lecture 08, Layout 1 12 CS250, UC Berkeley Fall ‘09 Bypass Capacitors ‣ On-chip bypass capacitors are not effective for off-chip drivers: ‣ On-chip capacitors keep the voltage difference across the power lines stable but cannot prevent the on-chip power-supply levels from moving up and down w.r.t. the board power-supply levels. ‣ Therefore, high-speed chip outputs are surrounded by many pwr and ground connections (pins). Example GTPs on Xilinx FPGAs, Memory interface on processors and bridge chips. Lecture 08, Layout 1 13 CS250, UC Berkeley Fall ‘09 Package Connections ‣ “Pads”: Special cells in the “design kit” for the layout generator. 200um 100um Pad (metal layer build-up), for wire bonding or solder-ball pad Vdd circuitry GND Circuitry provides ESD protection, drive strength for output, input buffering, registering of signals, etc. ‣ Classic example of large capacitive fan-out. Internal capacitances on order of femto-farads, external ones are picofarads. Uses staged drivers. Lecture 08, Layout 1 14 CS250, UC Berkeley Fall ‘09 Electro-static Discharge (ESD) and Over-voltage Protection Lecture 08, Layout 1 CS250, UC Berkeley Fall ‘09 15 I/O Floorplan “Perimeter” pads is the classic arrangement. ‣ Signals get routed from chip core to periphery. ‣ Perimeter ∝ sqrt(area) ‣ At 200um pitch => 100 ‣ Lecture 08, Layout 1 16 pads max per side (2cm die) Most commonly used for wire bond attachment. CS250, UC Berkeley Fall ‘09 I/O Floorplan Area pads allow higher number of connections. ‣ Allow up to thousands ‣ of connections. Wire-bonding no longer possible => “flip-chip” technologies. Lecture 08, Layout 1 CS250, UC Berkeley Fall ‘09 17 Flip-chip Bonding ‣ In addition to higher ‣ density, lower resistance and inductance. Eases layout of circuits and power/clk distribution (signals Implementing come-in/go-out closer to Xilinx Flip-Chip BGA Packages where needed) Copper Heatspreader Adhesive Epoxy* Thermal Interface Material Underfill Epoxy Flip Chip Solder Bump Silicon Die Solder Ball Organic Build-Up Substrate Figure 2: Package Construction with Type II Lid *Xilinx flip-chip packages are not hermetically sealed and exposure to cleaning solvents/ Lecture 08, Layout 1 18 chemicals or excessive moisture during boards assembly could pose serious package reliability concerns. Small vents are kept by design between the heatspreader (lid) and the organic substrate to allow for outgassing and moisture evaporation. Solvents or other corrosive chemicals could seep through these vents and attack the organic materials and components CS250, UC Berkeley Fall ‘09 Clock Distribution A challenging consequence of the synchronize design methodology is a need to distribute a clock signal in synchrony to every state element - and do so with low-power (this challenge has driven some to asynchronous circuit design). CLK CLK’ clock skew, delay in distribution Lecture 08, Layout 1 Clock skew can create both setup and hold problems. CS250, UC Berkeley Fall ‘09 19 Clock Skew Related Failure CLK CLK’ CLK CLK’ CL clock skew, delay in distribution ‣ If clock period T = TCL+Tsetup+Tclk→Q+Tskew, circuit will fail with setup time violation. ‣ This problem can be fixed by increasing the clock period. CLK CLK’ CL ‣ If TCL < Tskew, circuit will fail with hold time violation. ‣ Increasing the clock period will not help. Need to add delay. Lecture 08, Layout 1 20 CS250, UC Berkeley Fall ‘09 Clock Buffering Strategies Centralized Buffer RC delay of each path must be controlled. Load must be balanced. small variations ‣ “Leaf” wires end up being long and therefore 2 ‣ ‣ in length create large variation in delay (L effect) One approach is to use wide wires to minimize R. Use successfully in Alpha microprocessors (and others). But high power consumption. Lecture 08, Layout 1 CS250, UC Berkeley Fall ‘09 21 Clock Buffering Strategies Distributed Buffer Load must be balanced. Match strength all drivers within a level. Now standard practice. Lecture 08, Layout 1 22 CS250, UC Berkeley Fall ‘09 Clock Tree Layout Standard synthesis tools take “clock uncertainty into account”. Layout tools will automatically layout a low-skew clock tree. Advanced tools analyze and take into account clock skew bet ween pairs of state-elements. Lecture 08, Layout 1 CS250, UC Berkeley Fall ‘09 23 Spartan-3 FPGA Family: Functional Description R A Global clock input is placed in a design using either a BUFGMUX element or the BUFG (Global Clock Buffer) element. For the purpose of minimizing the dynamic power dissipation of the clock network, the Xilinx development software automatically disables all clock line segments that a design does not use. width of the die. In turn, the horizontal spine branches out into a subsidiary clock interconnect that accesses the CLBs. Clock Distribution Examples 2. The clock input of either DCM on the same side of the die — top or bottom — as the BUFGMUX element in use. GCLK7 GCLK5 GCLK6 4 4 4 BUFGMUX 4 DCM 4 4 • • DCM 8 • Top Spine Xilinx Virtex FPGA GCLK4 • • Array Dependent • 8 8 8 Horizontal Spine • Bottom Spine • • • • Array Dependent • 4 4 4 DCM 4 4 DCM 4 BUFGMUX GCLK2 GCLK0 GCLK3 Lecture 08, Layout 1 GCLK1 Figure 18: Spartan-3 Clock Network (Top View) 24 DS099-2_18_070203 CS250, UC Berkeley Fall ‘09 x Grid Figure 8 Tuned sector trees Delay Delay Sector buffers Clock Tree Delays, IBM “Power” CPU EECS150 - Lec16-timing x Spring 2009 Buffer level 2 Buffer level 1 Page 25 y Figure 7 3D visualization of the entire global clock network. The x and y coordinates are chip x, y, while the z axis is used to represent delay, so the lowest point corresponds to the beginning of the clock distribution and the final clock grid is at the top. Widths are proportional to tuned wire width, and the three levels of buffers k grid was completed with a tool run at the chip level, appear as vertical lines. necting unit-level pins to the grid. At this point, the 1.5 1.0 Volts (V) k tuning and the bottom-up clock routing process still e a great deal of flexibility to respond rapidly to even changes. Repeated practice routing and tuning were ormed by a small, focused global clock team as the k pins and buffer placements evolved to guarantee ibility and speed the design process. Measurements of jitter and skew can be carried out g the I/Os on the chip. In addition, approximately 100 metal probe pads were included for direct probing he global clock grid and buffers. Results on actual WER4 microprocessor chips show long-distance ws ranging from 20 ps to 40 ps (cf. Figure 9). This is roved from early test-chip hardware, which showed much as 70 ps skew from across-chip channel-length ations [19]. Detailed waveforms at the input and put of each global clock buffer were also measured compared with simulation to verify the specialized deling used to design the clock grid. Good agreement found. Thus, we have achieved a “correct-by-design” k-distribution methodology. It is based Spring 2009 on our design erience and measurements from a series of increasingly , complex server microprocessors. This method results high-quality global clock without having to use dback or adjustment circuitry to control skews. Delay Visualization using the sam trees and mul control are vis 20 ps skew 0.5 Multiple- the total wire delay is similar to the total buffer delay. fingeredA transmission patented tuning algorithm [16] was required to tune the line Figure 9 transmission lines in these sector more than 2000 tunable trees to achieve low skew, visualized as the flatness of the grid in Figure 8 visualizes four of x the 3D visualizations. 26 Clock Tree Delays, IBM Power y the 64 sector trees containing about 125 tuned wires driving 1/16th of the clock grid. While symmetric H-trees cuit design 0.0 0 500 1000 1500 Time (ps) 2000 2500 Global clock waveforms showing 20 ps of measured skew. including uncertainties associated with the modeling EECS150 - Lec16-timing of the floating-body effect [21–23] and its impact on noise immunity [22, 24 –27] and overall chip decoupling capacitance requirements [26], was another factor behind the choice of a primarily static design style. Finally, the size and logical complexity of the chip posed risks to meeting the schedule; choosing a simple, robust circuit style helped to minimize overall risk to the project Page From the routes comp the individu These clock level from th clock buffer pins to the u from the un Design me This clock-d productive c perspectives single clock driving the g wire widths grid had bee changes to t grid to be m late stage in clock wiring each hierarc using contra System Level Skew PLLs or DLLs often build on-chip to deskew chip core relative to PCB environment. They also get used for clock frequency multiplication. Lecture 08, Layout 1 27 CS250, UC Berkeley Fall ‘09 Floorplanning Strategies Pay attention to communication - data- and control-flow ‣ Wiring can account ‣ for the majority of the power consumption and area. Automatic layout tools do this locally. Global floorplanning (placement of large blocks) may need to be specified. Lecture 08, Layout 1 28 CS250, UC Berkeley Fall ‘09 Floorplanning Strategies Exploit Regularity ‣ Simplifies layout and ‣ ‣ verification.!"#$%&'&()*+#',$#-$./012 Create subblock and instantiate 3.445+6)*+#'7/4&6+-+6$0#895)($12#6: .(6;+*&6*9(&< many times. Helps in manufacturability and yield enhancement. Examples: FPGAs, memory blocks, bit-slice processor datapaths, systolic arrays, ... Lecture 08, Layout 1 !"#$%&'( 29 !"#$%&') !"#$%&'()* CS250, UC Berkeley Fall ‘09 !"#$%&'()*$$+,,-$$$).'/0$1 Next Time ‣ Chip layout examples ... Lecture 01, Introduction 1 30 CS250, UC Berkeley Fall ‘09

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Physical Implementation - Inst.eecs.berkeley.edu