Download Physical Implementation - Inst.eecs.berkeley.edu

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Switched-mode power supply wikipedia , lookup

Alternating current wikipedia , lookup

Power engineering wikipedia , lookup

Immunity-aware programming wikipedia , lookup

Mains electricity wikipedia , lookup

Islanding wikipedia , lookup

Atomic clock wikipedia , lookup

Flip-flop (electronics) wikipedia , lookup

Time-to-digital converter wikipedia , lookup

Transcript
CS250 VLSI Systems Design
Fall 2009
John Wawrzynek, Krste Asanovic’, with John Lazzaro
Physical
Implementation
Lecture 08, Layout 1
CS250, UC Berkeley Fall ‘09
Outline
Standard cell “back-end” place and route tools make layout
mostly automatic. However, you should understand some of
the more difficult concerns for physical implementation:
‣ Power distribution
‣ Chip input/output
‣ Clock distribution
‣ Floor-planning
‣ Not: Place & Route
Lecture 08, Layout
2
CS250, UC Berkeley Fall ‘09
Routing PWR and GND
‣ Main Design Concerns:
1. IR Drops. Remembering Ohms Law V = IR.
Voltage drops due to resistance in power/GND wires
slower circuits, false switching
2. Metal Migration (electromigration).
Accidental fusing
chip failure
3. Inductive Effects.
Bounce and oscillation on power nodes
Lecture 08, Layout 1
false switching
3
CS250, UC Berkeley Fall ‘09
Metal Migration
‣ A current flux through a metal conductor
exceeding a certain limit causes metal atoms
to move in the direction of the current.
Defect (or design)
‣ If there is a small constriction ...
‣ the current density will be higher and more
metal atoms will be carried away, creating a
higher current density, ...,
open circuit
like a fuse
‣ Metal migration begins at ∼2x105 A/cm2
‣ Design Guideline: Size width of both Vdd and
GND wires so that no more than ∼0.5mA/um
in width (aluminum).
Lecture 08, Layout 1
4
CS250, UC Berkeley Fall ‘09
Other Migration Related Failures
Lecture 08, Layout 1
5
CS250, UC Berkeley Fall ‘09
IR Drops
Lecture 01, Introduction 1
6
CS250, UC Berkeley Fall ‘09
IR Drops
‣ Switching induced
‣
Lecture 08, Layout 1
7
noise on pwr and GND
rails can couple
through to logic
signals and cause
accidental switching.
Particularly
troublesome around
circuit with low noise
immunity. Ex: RAM
block sense amplifiers.
CS250, UC Berkeley Fall ‘09
Layout Strategy for Power Rails
(power and ground wires)
1. Keep distance from source of
power as short as possible.
2. Use wide/thick metal.
3. Isolate “noisy” sections.
4. Use multiple sources.
Lecture 08, Layout 1
8
CS250, UC Berkeley Fall ‘09
Layout Strategy for Power Rails
(power and ground wires)
‣ With many layers of metal, upper few layers are typically
‣
‣
thicker than lower layers.
Upper layers get allocated for global power rails.
Via connections to lower layers for local distribution.
Lecture 08, Layout 1
9
CS250, UC Berkeley Fall ‘09
Inductive Bounce
Inductance in the power and ground paths results in voltage glitches
(noise) on Vdd and GND nodes.
On-chip L values of wires is small => usually not significant except:
1. Very large currents: clock drivers, off-chip drivers
2. Package pins, bonding wires (1nH/mm)
Package pins can have from 2-40nH of inductance, depending on package type
Strategy:
1. Use multiple bonding pads (wires) from Vdd & GND
2. If necessary, use on-chip bypass capacitors
Lecture 01, Introduction 1
10
CS250, UC Berkeley Fall ‘09
On-chip Bypass Capacitors
‣ When transistors switch, current is drawn from CD rather
than through package pins and bonding wires - smoothes
out dI/dt. (Think of it as a “charge cache”).
‣ Gate oxide leads to highest capacitance, but poor yield.
Thicker oxides are used.
Lecture 08, Layout 1
11
CS250, UC Berkeley Fall ‘09
On-chip Bypass Capacitance
‣
Distributed bypass capacitors also smooth out noise from IR drops
by locally supplying charge when needed.
‣
Generally, capacitance on Vdd and GND is a good thing. (Fat wires
help, larger transistor source regions help, extra capacitors when
needed)
‣
Capacitive bypassing continues all the way up the packaging chain.
Lecture 08, Layout 1
12
CS250, UC Berkeley Fall ‘09
Bypass Capacitors
‣
On-chip bypass capacitors are not effective for off-chip drivers:
‣
On-chip capacitors keep the voltage difference across the power lines
stable but cannot prevent the on-chip power-supply levels from moving
up and down w.r.t. the board power-supply levels.
‣
Therefore, high-speed chip outputs are surrounded by many pwr and
ground connections (pins). Example GTPs on Xilinx FPGAs, Memory
interface on processors and bridge chips.
Lecture 08, Layout 1
13
CS250, UC Berkeley Fall ‘09
Package Connections
‣ “Pads”: Special cells in the “design kit” for the layout generator.
200um
100um
Pad (metal layer build-up),
for wire bonding or
solder-ball
pad
Vdd
circuitry
GND
Circuitry provides ESD
protection, drive strength
for output, input buffering,
registering of signals, etc.
‣ Classic example of large capacitive fan-out. Internal
capacitances on order of femto-farads, external ones are picofarads. Uses staged drivers.
Lecture 08, Layout 1
14
CS250, UC Berkeley Fall ‘09
Electro-static Discharge (ESD)
and Over-voltage Protection
Lecture 08, Layout 1
CS250, UC Berkeley Fall ‘09
15
I/O Floorplan
“Perimeter” pads is the classic arrangement.
‣ Signals get routed from
chip core to periphery.
‣ Perimeter ∝ sqrt(area)
‣ At 200um pitch => 100
‣
Lecture 08, Layout 1
16
pads max per side (2cm die)
Most commonly used for
wire bond attachment.
CS250, UC Berkeley Fall ‘09
I/O Floorplan
Area pads allow higher number of connections.
‣ Allow up to thousands
‣
of connections.
Wire-bonding no longer
possible => “flip-chip”
technologies.
Lecture 08, Layout 1
CS250, UC Berkeley Fall ‘09
17
Flip-chip Bonding
‣ In addition to higher
‣
density, lower resistance
and inductance.
Eases layout of circuits
and power/clk
distribution (signals
Implementing
come-in/go-out closer
to Xilinx Flip-Chip BGA Packages
where needed)
Copper Heatspreader
Adhesive Epoxy*
Thermal Interface Material
Underfill Epoxy
Flip Chip Solder Bump
Silicon Die
Solder Ball
Organic Build-Up Substrate
Figure 2: Package Construction with Type II Lid
*Xilinx flip-chip
packages are not hermetically sealed and exposure
to cleaning solvents/
Lecture 08, Layout 1
18
chemicals or excessive moisture during boards assembly could pose serious package
reliability concerns. Small vents are kept by design between the heatspreader (lid) and the
organic substrate to allow for outgassing and moisture evaporation. Solvents or other corrosive
chemicals could seep through these vents and attack the organic materials and components
CS250, UC Berkeley Fall ‘09
Clock Distribution
A challenging consequence of the synchronize design
methodology is a need to distribute a clock signal in synchrony
to every state element - and do so with low-power (this
challenge has driven some to asynchronous circuit design).
CLK
CLK’
clock skew, delay in distribution
Lecture 08, Layout 1
Clock skew can create
both setup and hold
problems.
CS250, UC Berkeley Fall ‘09
19
Clock Skew Related Failure
CLK
CLK’
CLK
CLK’
CL
clock skew, delay in distribution
‣ If clock period T = TCL+Tsetup+Tclk→Q+Tskew, circuit will fail with
setup time violation.
‣ This problem can be fixed by increasing the clock period.
CLK
CLK’
CL
‣ If TCL < Tskew, circuit will fail with hold time violation.
‣ Increasing the clock period will not help. Need to add delay.
Lecture 08, Layout 1
20
CS250, UC Berkeley Fall ‘09
Clock Buffering Strategies
Centralized Buffer
RC delay of each
path must be
controlled.
Load must be
balanced.
small variations
‣ “Leaf” wires end up being long and therefore
2
‣
‣
in length create large variation in delay (L effect)
One approach is to use wide wires to minimize R.
Use successfully in Alpha microprocessors (and others). But
high power consumption.
Lecture 08, Layout 1
CS250, UC Berkeley Fall ‘09
21
Clock Buffering Strategies
Distributed Buffer
Load must be
balanced.
Match strength
all drivers
within a level.
Now
standard
practice.
Lecture 08, Layout 1
22
CS250, UC Berkeley Fall ‘09
Clock Tree Layout
Standard synthesis tools take “clock uncertainty into account”. Layout tools
will automatically layout a low-skew clock tree. Advanced tools analyze
and take into account clock skew bet ween pairs of state-elements.
Lecture 08, Layout 1
CS250, UC Berkeley Fall ‘09
23
Spartan-3 FPGA Family: Functional Description
R
A Global clock input is placed in a design using either a
BUFGMUX element or the BUFG (Global Clock Buffer) element. For the purpose of minimizing the dynamic power dissipation of the clock network, the Xilinx development
software automatically disables all clock line segments that
a design does not use.
width of the die. In turn, the horizontal spine branches
out into a subsidiary clock interconnect that accesses
the CLBs.
Clock Distribution Examples
2. The clock input of either DCM on the same side of the
die — top or bottom — as the BUFGMUX element in
use.
GCLK7
GCLK5
GCLK6
4
4
4 BUFGMUX
4
DCM
4
4
•
•
DCM
8
•
Top Spine
Xilinx
Virtex
FPGA
GCLK4
•
•
Array Dependent
•
8
8
8
Horizontal Spine
•
Bottom Spine
•
•
•
•
Array Dependent
•
4
4
4
DCM
4
4
DCM
4 BUFGMUX
GCLK2
GCLK0
GCLK3
Lecture 08, Layout 1
GCLK1
Figure 18: Spartan-3 Clock Network (Top View)
24
DS099-2_18_070203
CS250, UC Berkeley Fall ‘09
x
Grid
Figure 8
Tuned
sector
trees
Delay
Delay
Sector
buffers
Clock Tree
Delays,
IBM “Power”
CPU
EECS150 - Lec16-timing
x
Spring 2009
Buffer level 2
Buffer level 1
Page 25
y
Figure 7
3D visualization of the entire global clock network. The x and y
coordinates are chip x, y, while the z axis is used to represent
delay, so the lowest point corresponds to the beginning of the
clock distribution and the final clock grid is at the top. Widths are
proportional to tuned wire width, and the three levels of buffers
k grid was completed with a tool run at the chip level,
appear as vertical lines.
necting unit-level pins to the grid. At this point, the
1.5
1.0
Volts (V)
k tuning and the bottom-up clock routing process still
e a great deal of flexibility to respond rapidly to even
changes. Repeated practice routing and tuning were
ormed by a small, focused global clock team as the
k pins and buffer placements evolved to guarantee
ibility and speed the design process.
Measurements of jitter and skew can be carried out
g the I/Os on the chip. In addition, approximately 100
metal probe pads were included for direct probing
he global clock grid and buffers. Results on actual
WER4 microprocessor chips show long-distance
ws ranging from 20 ps to 40 ps (cf. Figure 9). This is
roved from early test-chip hardware, which showed
much as 70 ps skew from across-chip channel-length
ations [19]. Detailed waveforms at the input and
put of each global clock buffer were also measured
compared with simulation to verify the specialized
deling used to design the clock grid. Good agreement
found. Thus, we have achieved a “correct-by-design”
k-distribution methodology.
It is based
Spring
2009 on our design
erience and measurements from a series of increasingly
, complex server microprocessors. This method results
high-quality global clock without having to use
dback or adjustment circuitry to control skews.
Delay
Visualization
using the sam
trees and mul
control are vis
20 ps skew
0.5
Multiple-
the total wire delay is similar to the total buffer delay.
fingeredA
transmission
patented tuning algorithm [16] was required to tune
the
line
Figure 9 transmission lines in these sector
more than 2000 tunable
trees to achieve low skew, visualized as the flatness of the
grid in
Figure
8 visualizes
four of
x the 3D visualizations.
26
Clock Tree Delays,
IBM
Power
y
the 64 sector trees containing about 125 tuned wires
driving 1/16th of the clock grid. While symmetric H-trees
cuit design
0.0
0
500
1000
1500
Time (ps)
2000
2500
Global clock waveforms showing 20 ps of measured skew.
including uncertainties associated with the modeling
EECS150
- Lec16-timing
of the floating-body
effect
[21–23] and its impact on
noise immunity [22, 24 –27] and overall chip decoupling
capacitance requirements [26], was another factor behind
the choice of a primarily static design style. Finally, the
size and logical complexity of the chip posed risks to
meeting the schedule; choosing a simple, robust circuit
style helped to minimize overall risk to the project
Page
From the
routes comp
the individu
These clock
level from th
clock buffer
pins to the u
from the un
Design me
This clock-d
productive c
perspectives
single clock
driving the g
wire widths
grid had bee
changes to t
grid to be m
late stage in
clock wiring
each hierarc
using contra
System Level
Skew
PLLs or DLLs often
build on-chip to
deskew chip core
relative to PCB
environment.
They also get used
for clock
frequency
multiplication.
Lecture 08, Layout 1
27
CS250, UC Berkeley Fall ‘09
Floorplanning Strategies
Pay attention to communication - data- and control-flow
‣ Wiring can account
‣
for the majority of
the power
consumption and
area.
Automatic layout
tools do this locally.
Global floorplanning
(placement of large
blocks) may need to
be specified.
Lecture 08, Layout 1
28
CS250, UC Berkeley Fall ‘09
Floorplanning Strategies
Exploit Regularity
‣ Simplifies layout and
‣
‣
verification.!"#$%&'&()*+#',$#-$./012
Create subblock and instantiate
3.445+6)*+#'7/4&6+-+6$0#895)($12#6: .(6;+*&6*9(&<
many times.
Helps in
manufacturability and
yield enhancement.
Examples: FPGAs, memory
blocks, bit-slice processor
datapaths, systolic
arrays, ...
Lecture 08, Layout 1
!"#$%&'(
29
!"#$%&')
!"#$%&'()*
CS250, UC Berkeley Fall ‘09
!"#$%&'()*$$+,,-$$$).'/0$1
Next Time
‣ Chip layout examples ...
Lecture 01, Introduction 1
30
CS250, UC Berkeley Fall ‘09