Download Clock Distribution

Document related concepts

Immunity-aware programming wikipedia , lookup

Atomic clock wikipedia , lookup

Flip-flop (electronics) wikipedia , lookup

Time-to-digital converter wikipedia , lookup

Transcript
Clock
Generation Distribution
Clock Generation
Single phase clock system
Simplest clocking methodology is to use a
single clock in conjunction with a register.
Clocks are generated with global clock buffers
CLK and CLK bar are generated locally.
On-Chip Clock Insertion Delay
Clock driver
ext. Clk
Insertion delay
D
int. Clk Clk
D
Clk
5
Clock Generation using DLLs
U
fREF
Phase
Det
Charge
Pump
DL
D
Filter
fO
PLL
Reference
clock
Local
clock
Up
Phase
detector
Charge
pump
Loop
filter
vcont
VCO
Down
Divide by
N
System
Clock
• Phase locked loops (PLL) are used to generate
internal clocks on chips for two main reasons:
 to synchronize the internal clock of a chip
with an external clock
to operate the internal clock at a higher rate
than the external clock input
Phase-Locked Loop Block Diagram and
Operation
Clock driver
PLL
ext. Clk
p
Cload
cv
PD
VCO
LP
Cload
int. Clk
Cload
ext. Clk
int. Clk
p
cv
135o out of phase
45o out of phase
9
Ring-Oscillator-Based VCO
with CMOS Inverters as Delay Elements
Vreg
1
f osc
Nov. 14, 2003
n  2k  1
Tinv
1

; k 1
2nTinv
10
System clocking schemes: (a) single-phase clock;
(b) two-phase clock; (c) multiple-phase clock
Clk
(a)
1
2
(b)
1
2
3
4
k
two-phase clock
Function of clock distribution network
Synchronize millions (billions) of separate
elements
Within a time scale on order of ~10 ps
Clock Parameters: Period (T), Width, Rise and
Fall Times
trise
tfall
Clock
W
W
w
T
- duty cycle
T
14
Clock Parameters: Period, Width, Clock Skew and Clock
Jitter
tDRVCLK
Ref_Clock
tskew
t jit
tskew
t jit
Received Clock
T
tRCVCLK
15
Clock Uncertainties
t DRV_CLK
Ref_Clock
tskew
tskew
t jit
Received Clock
tRCV_CLK
T
t jit
Clock
uncertainty:
jitter+skew
16
Timing Constraint
Static and dynamic time analysis
• Timing analysis is the methodical analysis of a
digital circuit to determine if the timing
constraints imposed by components or
interfaces are met.
• Typically, this means that you are trying to
prove that all set-up, hold, and pulse-width
times are being met.
•
• a chip must meet the timing constraints in
order to operate at the intended clock rate,
• Function of clock distribution network:
• Synchronize millions (billions) of separate
elements Within a time scale
Clock Parameters: Period (T), Width, Rise and
Fall Times
trise
tfall
Clock
W
W
w
T
- duty cycle
T
20
•Devices perform the operation in a active clock
cycle
Setup Time:
• The Time when input data is available and
stable before the clock pulse is applied is called Setup
time.
• Setup time is the minimum amount of time the data
signal should be held steady before the clock event so
that the data are reliably sampled by the clock. This
applies to synchronous circuits such as the flip-flop.
• Or the amount of time the Synchronous input (D)
must be stable before the active edge of the Clock.
Hold time:
• The Time after clock pulse where data input is
held stable is called hold time.
• Hold time is the minimum amount of time the
data signal should be held steady after the clock
event so that the data are reliably sampled. This
applies to synchronous circuits such as the flipflop.
• Or in short the amount of time the synchronous
input (D) must be stable after the active edge of
clock.
Clock Parameters: Period, Width, Clock Skew and Clock
Jitter
tDRVCLK
Ref_Clock
tskew
t jit
tskew
t jit
Received Clock
T
tRCVCLK
24
Metrics/Goals
• Besides basic connectivity, what makes a
clock network good or bad?
– Skew
– Jitter
– Power
– Area
– Slew rates
Clock Skew
Clock Skew
• Defined as: Maximum difference in arrival times of
clock signal to any 2 latches/FF’s fed by the network
Clock Skew
Skew = max | t1 – t2 |
Clock Skew
• Causes:
– Designed (unavoidable) variations – mismatch in
buffer load sizes, interconnect lengths
– Temperature gradients – changes MOSFET
performance across die
– IR voltage drop in power supply – changes MOSFET
performance across die
• Note: Delay from clock generator to fan-out
points (clock latency) is not important by itself
– BUT: increased latency leads to larger skew for the
same amount of relative variation
Clock Skew
• Effect:
– Eats into timing budget
– Needs to be considered for maximum (setup) and
minimum (hold) path timings
Cycle time
Requirements
• clock waveforms must be particularly
clean and sharp.,
• No skew
special attention has be made by designing the
clock tree.
CAD tools are able to design balanced clock trees.
Jitter
Jitter
• From one clock cycle to the next, the period is
not exactly the same each time
• Maximum difference in phase of clock between
any two periods is jitter
NOTES : JITTER J1 = t2 – t1
JITTER J2 = t3 – t2
Jitter
• Caused by variations in clock period that result
from:
– Phased-lock loop (PLL) oscillation frequency
– Various noise sources affecting clock
generation and distribution
• Ex. Power supply noise which
dynamically alters the drive
strength of intermediate
Buffer stages
• Jitter can be reduced by minimizing power
supply noise (IR and L*di/dt)
Jitter Impact on Timing Budget
• Needs to be considered in maximum path timing (setup)
• Typically on the order of 50ps in high-end microprocessors
Clock Power
Clock Power
• Power consumption in clocks due to:
– Clock drivers
– Long interconnections
– Large clock loads – all clocked elements (latches,
FF’s) are driven
• Different components dominate
– Depending on type of clock network used
– Ex. Grid – huge pre-drivers & wire cap. drown out
load cap.
Clocks: Power-Hungry
P = α C Vdd2 f
Not only is the clock capacitance large, it switches
every cycle!
Low Power Clocking Techniques
• Gated clocks
– Prevent switching in areas of the chip not being
used
– Easier in static designs
• Edge-triggered flip-flops in ARM rather than
transparent latches in Alpha
– Reduced load on clock for each flip-flop
Clock Distribution Metric: Area
• Clock networks consume silicon area (clock
drivers, PLL, etc.) and routing area
Top-level metals are used to reduce RC delays
– These levels are precious resources (unscaled)
• By minimizing area used, we also reduce
wiring capacitance & power
Slew Rates
• To maintain signal integrity and latch
performance, minimum slew rates are
required
• Too slow – clock is more susceptible to
noise, latches are slowed down, eats into
timing budget
• Too fast – burning too much power,
overdesigned network, enhanced ground
bounce
Slew Rates
• Latch set-up times are dependent on clock
input slew rates (eats into timing budget)
• Short-circuit power grows with larger slew rates
– This can be significant for large clock drivers
Ref : IBM sebsite, Carring
Technology Trends: Power
• Heavily pipelined design  more latches,
more capacitive load for clock
• Larger chips  more wire-length needed to
cover the entire die
• Complexity  more functionality and devices
means more clocked elements
• Dynamic logic  more clocked elements
Clock Distribution
Clock tree style
3 Fanout Balance Tree Model
1 H-Tree Model
Less flexible
Net applicable to placement
2 Binary Tree Model
Easy to construct
Weak for blest latch distribution
4 Spine and trunk Model (Fish and Bone)
Easy to adjust Net Loading
Many dummy cells are needed
Skew Hardly influenced
by Process Scattering
Die size increase
Clock Distribution Example
Alpha 21264 clock
distribution -- grid + Htree approach
Power = 32% of total
Wire usage = 3% of
metals 3 & 4
4 major clock quadrants, each with a large driver connected
to local grid structures
Network Types: Grid
• Gridded clock distribution was
common on earlier DEC Alpha
microprocessors
• Advantages:
– Skew determined by grid density
and not overly sensitive to load
position
– Clock signals are available
everywhere
– Tolerant to process variations
– Usually yields extremely low skew
values
Pre-drivers
Global grid
Grid Disadvantages
• Huge amounts of wiring & power
– Wire cap large
– Strong drivers needed – pre-driver cap large
– Routing area large
• To minimize all these penalties, make grid pitch coarser
– Skew gets worse
– Losing the main advantage
• Don’t overdesign
– let the skew be as large as tolerable Still
– grids seem non-feasible for SOC’s
Network Types: Tree
• Original H-tree
– One large central driver
– Recursive H-style structure to
match wire-lengths
– Halve wire width at
branching points to reduce
reflections
A
B
H-Tree Problems
– slew degradation along long RC paths
– unrealistically large central driver
• Clock drivers can create large temperature gradients (ex.
Alpha 21064 ~30° C)
– non-uniform load distribution
• Inherently non-scalable (wire resistance skyrockets)
• Solution to some problems
– Introduce intermediate buffers along the way
– Specifically at branching points
Buffered Clock Tree
Buffered H-tree
• Advantages
– Ideally zero-skew
– Can be low power (depending on skew
requirements)
– Low area (silicon and wiring)
– CAD tool friendly (regular)
• Disadvantages
– Sensitive to process variations
– Local clocking loads are inherently non-uniform
Clock Skew and Clock balancing
• Clock skew
– Hold time violation is critical to working silicon
– Aggressive skew budget for high speed operation
– Large turn-around-time for clock tree synthesis at P&R
stage
– Skew Source : process + voltage + temp + load + jitter}
– Skew Budget == ( Target Cycle Time ) /20 , min clk->Q
• Solution
– CTS (Clock Tree Synthesis)
– Insert dummy delay at Synthesis  Over-design
Practical problem in Clock tree synthesis
• Problems
– Large chip size due to SOC integration
– Unbalanced FF distribution
– Top-level : Interconnect RC dominant
– Iteration cost
– Test clock
– Multiple clock frequency
• Solution
– Plan from the early design stage
– Skew budgeting : 100ps @ 200MHz
Block level clock tree
• Block-level clock skew
– Driver-limited
– Optimization of the buffer strength and number
• Clock tree synthesis
– Commercial tool
– Many iterations
– Long turn-around-time
• Clock tree planning
– Virtual clock tree generation
– Need engineering approximation
Real Clock Tree
clk.4.1
clk.5.1
clk.3.1
Clock tree style
Trunk-and-Branch
Top Level Clock Distribution
PLL
NW
NE
L2
SS
SW
system
SE
Real Example