Download Lecture 8 - The University of Texas at Austin

Document related concepts

Flip-flop (electronics) wikipedia , lookup

Atomic clock wikipedia , lookup

Phase-locked loop wikipedia , lookup

Time-to-digital converter wikipedia , lookup

Transcript
EE 382M
VLSI–II
Global
Clocking
ClkA
Héctor Sánchez
Mark McDermott
Tcycle
Tjitter
ClkD
EE 382M Class Notes
Foil # 1
The University of Texas at Austin
Class Agenda
•
•
•
•
•
•
•
•
Clocking Overview
Clock Generation
Clock Distribution
Clock Uncertainty
– Clock Skew
– Clock Jitter
Clock Regeneration
– Tunable global clock buffers
– Local clock buffers
Clocking with SOI Technology
Key learning’s
Future Clocking Issues
EE 382M Class Notes
Foil # 2
The University of Texas at Austin
Clocking Overview
•
•
•
•
Most high speed digital systems use clocks to synchronize data
transactions.
The maximum clock frequency determines the rate at which the
data can be processed.
The clocking style is dependent on the circuits used to
implement the logic elements and storage elements.
There are three main components to clocking:
– Generation: Crystal Oscillators, PLLs or DLLs.
– Distribution: Trees, grids, etc.
– Re-generation: LCB, GCB
EE 382M Class Notes
Foil # 3
The University of Texas at Austin
Clocking Overview
•
There are a number of issues with clocking a high performance
digital system. These include:
– Clock uncertainty: skew and jitter.
– Frequency dependent failures
– Frequency independent failures.
– Clock Distribution Power
EE 382M Class Notes
Foil # 4
The University of Texas at Austin
Logic Transactions and CLK Dependence
Clocking
Element
ClkA
Domain
Clocking
Element
ClkD
Domain
Logic block
ClkA
Tcycle
Tuncertainty
ClkD
EE 382M Class Notes
Foil # 5
The University of Texas at Austin
Skew and Jitter: Definition
•
•
•
Skew and Jitter are the enemy of a clocking system.
Skew is the “static” time difference between any 2 electrical
nodes.
– Typically with reference to clock signals that should in theory
switch simultaneously.
– There are techniques to reduce or eliminate skew.
– Skew can be “managed” to your advantage ( i.e. intentional
skew can be used to provide “cycle-stealing” capability )
Jitter is a “dynamic” time difference of a signal with respect to
an ideal signal.
– Jitter can not be typically “designed-out”.
– Jitter can result from various time-dependent noise events.
EE 382M Class Notes
Foil # 6
The University of Texas at Austin
Clocking Overview
•
Clocking overhead ( skew and jitter ) is growing as we move to
nanometer processes. Careful design of the clock generation
and distribution circuits is now required for all high performance
processor designs.
EE 382M Class Notes
Foil # 7
The University of Texas at Austin
Clock Generation
•
There are two techniques used to synchronize the clocks in a high performance
system: Phase Locked Loop (PLL) or a Delay Locked Loop (DLL)
•
The PLL is used to “phase” synchronize (and probably multiply) the system
clock WRT to a reference clock (internal or external).
– PLL features:
• Frequency Multiplication to run processor at faster speed than memory interface.
• Skew reduction. The reference clock is “aligned” to the feedback clock.
• Possible “stability” issues with PLL due to 2nd or 3rd order loop behavior.
•
The DLL is used to “delay” synchronize the system clock to a reference clock.
– DLL Features:
• DLL’s typically do not multiply an input clock, although high performance DLL’s
have been designed to implement limited frequency multiplication.
• DLL’s are inherently stable. Think of “only” trying to align delay not delay and
frequency as in a PLL.
•
Some high performance systems use a combination of both to generate the
various clocks in a multiple clock domain design.
– SOC designs can have many multiple frequency clock domains.
EE 382M Class Notes
Foil # 8
The University of Texas at Austin
High Performance Processor Clock Network
4*XX
4*XX/N
4*XX*F/N
EE 382M Class Notes
Foil # 9
The University of Texas at Austin
Clock Distribution
•
•
•
Clock distribution is one of the most critical areas in the design
of high performance VLSI chips.
– Poor clock distribution can result in excessive clock skews
between clusters on the chip, reducing the maximum
operating frequency.
In general we need to reduce the effect of clock skew on the
chip. This requires:
– Reducing the wire delay and RC effects by making the
effective delay small and balancing the delays of all the
paths. (This changes a total delay problem to a matching
delay problem.)
– Matching the clock buffer delay
– Reducing the process variations sensitivities by careful
placement and design of the of clock buffers.
A side effect of long clock distribution delays is increased jitter
due to supply voltage variations. This adversely affects the PLL
used to generate the chip clock frequency.
EE 382M Class Notes
Foil # 10
The University of Texas at Austin
Schematic of a typical clock distribution tree
Reference
Clock
Core Clock
PLL
Global
Clock
Buffer
Gclk
Local
Clock
Buffer
Clk
Clk
System
Clock
Gclk
Global
Clock
Buffer
EE 382M Class Notes
Gclk
Foil # 11
Local
Clock
Buffer
Clk
Local
Clock
Buffer
Clk
Latch/FF
Latch/FF
Latch/FF
Latch/FF
Latch/FF
Latch/FF
The University of Texas at Austin
Clock Distribution
•
•
There are four basic types of clock distribution networks used in
high performance processor designs:
– Tree: IBM and Motorola Power-PC, HP PA-RISC
– Grid: SPARC, Alpha
– Serpentine: Pentium-III
– Spine: Alpha, Pentium-4
Each technique has advantages and disadvantages:
Wire Cap
Delay
Skew
Grid
High – 15x
Low –
sub100 ps
Low-Med
Trees
Low – 1x
High –
100’s ps
Low
Serpentine
Very High
– 30x
High –
100’s ps
Low
Spine
High – 10x
Lowsub100ps
Med
EE 382M Class Notes
Foil # 12
The University of Texas at Austin
Variations of tree distribution networks
Target: Metallization and Gate topology uniformity
Tapered H-Tree
EE 382M Class Notes
Foil # 13
The University of Texas at Austin
HP PA-RISC Clock Distribution
EE 382M Class Notes
Foil # 14
The University of Texas at Austin
IBM PowerPC Clock Distribution
Source: A 1GHz single-issue 64b PowerPC processor, ISSCC’2000
EE 382M Class Notes
Foil # 15
The University of Texas at Austin
Sun Microsystems UltraSparc III
EE 382M Class Notes
Foil # 16
The University of Texas at Austin
DEC Alpha 21164 Clock Distribution
EE 382M Class Notes
Foil # 17
The University of Texas at Austin
DEC Alpha 21264 Clock Distribution
EE 382M Class Notes
Foil # 18
The University of Texas at Austin
Pentium® 4 Processor Clock Network
•
2GHz triple-spine clock distribution
EE 382M Class Notes
Foil # 19
The University of Texas at Austin
IBM POWER-4 in .18u SOI
•
•
•
H-Tree and Grid Distribution
Clock skew: <25ps
~70% power in clock and latches
EE 382M Class Notes
•
•
•
Foil # 20
Dual core, shared L2
174M transistors
115W at 1.1GHz, 1.5V
The University of Texas at Austin
ISSCC 2007: Multi-Core Clocking Approach
Asynchronous Communication and Independent Core
Frequencies
Source: An Integrated Quad-Core Opteron™ Processor
ISSCC 2007
EE 382M Class Notes
Source: A 4320MIPS Four-Processor Core SMP/AMP with
Individually Managed Clock Frequency for Low
Power Consumption, ISSCC 2007
Foil # 21
The University of Texas at Austin
ISSCC 2007: Multi-Core Clocking Approach
Asynchronous Communication
Source: An 80-Tile 1.28TFLOPS Network-on-Chip
in 65nm CMOS, ISSCC 2007
EE 382M Class Notes
Foil # 22
The University of Texas at Austin
Clock Uncertainty
•
Clock uncertainty is defined as the uncertainty in time in which a
clock edge will appear. It is determined by clock skew, clock
jitter and clock overhead.
•
There are a number of sources which cause clock uncertainty. A
typical breakdown of the sources are:
EE 382M Class Notes
Foil # 23
The University of Texas at Austin
Clock Uncertainty
•
•
•
•
An example of static clock uncertainty is clock skew. Clock skew
represents the difference in delay of two identical clock signals
arriving at two different locations on the chip (spatial
separation).
Clock jitter is the clock edge inaccuracy introduced by the clock
signal generation circuitry.
Clock overhead refers to the time a sequential storage element
needs to positively store (or resolve) the incoming data. This
time is directly related to the metastability properties of the
sequential storage element.
Clock uncertainty can have detrimental effects on the viability of
a design:
– Min-delay (hold) failures are frequency independent
• Chip must be discarded
– Max-delay (setup) failures are frequency dependent
• Chip can be sold at a lower frequency
EE 382M Class Notes
Foil # 24
The University of Texas at Austin
Clock Skew
•
•
Clock skew can be intentional or unintentional. For example,
intentional clock skew may be injected in order to fix a race
condition in a block of logic. This is typically achieved by the
use of a variable delay clock regeneration buffer.
Clock skew can be positive or negative depending on how the
reference clock is chosen.
EE 382M Class Notes
Foil # 25
The University of Texas at Austin
Clock Skew
•
Clock skew accounts on average for about 5% of the cycle time
and is trending higher as frequency increases
•
In general we strive to minimize clock skew, however we must
design our circuits to be clock skew tolerant.
– Good reading on this topic:
Harris, Skew Tolerant Circuit Design, Morgan Kaufmann
Publishers
EE 382M Class Notes
Foil # 26
The University of Texas at Austin
Clock Skew
System
Clock
Core Clock
PLL
Global
Clock
Buffer
Gclk
Local
Clock
Buffer
ClkA
Latch/FF
ClkB
Latch/FF
Gclk
Global
Clock
Buffer
Gclk
Local
Clock
Buffer
ClkC
Local
Clock
Buffer
ClkD
Latch/FF
Latch/FF
ClkA
A single transition of the core clock does
not arrive at all latches or flip-flops at the
same time.
EE 382M Class Notes
Foil # 27
Tskew
ClkD
The University of Texas at Austin
Sources of Clock Skew
•
In-die Process, Voltage, Temperature (PVT) variation
– Different clock buffers with different channel lengths
– Local drop in voltage leads to increased buffer delay
– Hot spots lead to increased gate and wire delay
– Device mismatch across die
•
Wire coupling
– Coupling will be different on different clock routes
RC mismatch
– Clock routes not all of equal length
– Latches not all equal distance from LCB (local clock buffer)
Inductance of high speed, low resistance lines changes edge-rates.
Unequal buffering can cause additional skew due to rise timedependent delay in buffers.
•
•
EE 382M Class Notes
Foil # 28
The University of Texas at Austin
Industry clock skew data
Processor
Frequency
(MHz)
Clock Skew (ps)
Process
Itanium™
800
110ps w/o deskew
28ps w/ deskew
.18µ
PowerPC
1000
15ps with Cu wires
.22µ
UltraSPARC III
800
80ps Al wires, no
deskew
.18µ
Alpha
600
72ps Al wires, no
deskew
.13µ
Source: ISSCC Papers
EE 382M Class Notes
Foil # 29
The University of Texas at Austin
Itanium Clock Distribution Trend
J. Stinson and S. Rusu, ISSCC 2003
EE 382M Class Notes
Foil # 30
The University of Texas at Austin
IBM POWER-4 3D Skew Visulation
EE 382M Class Notes
Foil # 31
The University of Texas at Austin
Clock Deskewing
•
Active clock deskewing is accomplished by dynamically delaying the
global clock signals.
•
This can result in clock jitter. Careful analysis is required to validate the
benefits.
EE 382M Class Notes
Foil # 32
The University of Texas at Austin
Dual Zone Clock De-skew
EE 382M Class Notes
Foil # 33
The University of Texas at Austin
Clock Jitter
•
•
Clock jitter is the clock edge inaccuracy introduced by the clock
signal generation circuitry. Clock jitter may be viewed as a
statistical variation of the clock period or duty cycle.
Sources of clock jitter:
– Temporal power supply variations
• Changing activity can alter supply voltage in different cycles
affecting either the global or regional (local) clock buffers.
– PLL Jitter
•
•
•
•
Supply variation at PLL can affect oscillator frequency
PLL components do not have zero response time
Reference clock jitter being multiplied by the PLL
Global clock distribution may add jitter to PLL due to supply
noise causing the feedback clock signal to seem to jitter.
– Wire coupling
• Changing data can alter coupling in different cycles
– Dynamic Deskewing Circuitry
EE 382M Class Notes
Foil # 34
The University of Texas at Austin
Clock Jitter
System
Clock
Core Clock
PLL
Global
Clock
Buffer
Gclk
ClkA
Local
Clock
Buffer
Latch/FF
ClkB
Latch/FF
Gclk
Global
Clock
Buffer
Gclk
Local
Clock
Buffer
ClkC
Local
Clock
Buffer
ClkD
Latch/FF
Latch/FF
ClkA
Clock frequency at any point in
the clock tree is not constant.
The worst case jitter determines
usable clock cycle time.
EE 382M Class Notes
Tcycle
Tjitter
ClkD
Foil # 35
The University of Texas at Austin
Pentium® 4 Processor Jitter Reduction
EE 382M Class Notes
Foil # 36
The University of Texas at Austin
Power Supply Noise Influences Jitter
Voltage Comparison: MIM vs. no MIM
1.2V 90nm SOI microprocessor
No MIM => 250mV noise => -20.6%
H. Sanchez, ISSCC2006
8ff/um2 MIM => 156mV noise => -13.0%
Measured Core Vdd no-MIM vs. with-MIM
while running same code
EE 382M Class Notes
Foil # 37
The University of Texas at Austin
Supply Noise Clocking Issue: Instantaneous Phase Shifts
H. Sanchez, ISSCC2006
•
PLL
•
Measured data from highly integrated SoC
•
TIE of global clock: Non-MIM shifts 592ps as
compared to MIM where it is reduced to 388ps.
EE 382M Class Notes
Foil # 38
Clock distribution
buffers are exposed to
Vdd noise
Large di/dt events of
microprocessor Core
can cause substantial
Vdd noise that in turn
affects the
instantaneous phase
relationships of clocks
Addition of MIM decap
shows a measured 40%
reduction in peak-peak
phase change due to
large transient currents
related to
microprocessor Core
activity.
The University of Texas at Austin
Clock Regeneration
•
In the Tree, Grid and Serpentine clock networks it is necessary
to buffer (regenerate) the clock signals to ensure satisfactory
edge rates and reduce skew.
•
The global clock buffers (GCB) are used to regenerate the clock
signal(s) to a region or cluster in the chip. They are typically
designed with skew adjustment control.
•
The local clock buffers (LCB) are used to regenerate the clock
signal(s) to functional blocks in each cluster. The LCB usually
contains logic which allow the clock signals to be gated on or off
to reduce power.
EE 382M Class Notes
Foil # 39
The University of Texas at Austin
Global Clock Buffer
Core
Clock
Gclk
SpeedUp1
SpeedUp2
SpeedUp3
Global clock buffers can use variable delay to compensate for
RC mismatches
EE 382M Class Notes
Foil # 40
The University of Texas at Austin
Local Clock Buffers
Gclk
Clk
En1
Clk#
En2
ClkDelayed
En3
ClkDelayed#
En4
Local clock buffers use enable signals to reduce average power.
EE 382M Class Notes
Foil # 41
The University of Texas at Austin
Clocking in PD SOI Technology
•
Features of Partially Depleted SOI transistors relative to Bulk
transistors:
– Higher Idsat
– Lower Junction Capacitance
– Floating Body
– Worse Self Heating
– No body effect for stacked transistors
– Dynamic Id increase due to Gate-Body Coupling
– Faster switching delays
– History Effect
• Delay variation as function of Logic Gate activity
• Bad for clocking
EE 382M Class Notes
Foil # 42
The University of Texas at Austin
Bulk NMOS and PD SOI NMOS
Floating Body
Gate
tSi ≅ 150 nm
tOX
Oxide
N+
tSi
P
Buried Oxide (BOX)
N+
Oxide
tBOX
P Substrate
C. T. Chuang et al., SOI Circuit Design for High-Performance
CMOS Microprocessors, 2001 IEEE ISSCC Microprocessor Design Workshop
EE 382M Class Notes
Foil # 43
The University of Texas at Austin
FB Effect : History-dependent Delay in PD SOI
Non-50% duty cycle clocks at startup !
TDfall
L-H
12
10
H-L
TDrise
H-L
8
10- 9 10-8 10-7 10-6 10-5
0.0
0.5
VBS-fall
-0.1
0.4
L-H
-0.2
H-L
0.3
0.2
-0.3
-0.4
VBS-rise
0.1
-0.5
0.0
-0.6
VBS - pFET (V)
14
0.6
VBS - nFET (V)
Gate Delay (ps)
16
1.8 V, Leff = 0.145 µm, Wp/Wn = 2,
1.0 ns Period, 50% Duty Cycle, 100 ps Input Slew
Initial Input at “Low” (L-H) / ”High” (H-L)
10- 9 10-8 10-7 10-6 10-5
Time (sec)
Time (sec)
Igate reduces history effect due to control of floating body ! ( not shown above )
M. M. Pelella et al., VLSI-TSA, 1999 via
C. T. Chuang et al., SOI Circuit Design for High-Performance
CMOS Microprocessors, 2001 IEEE ISSCC Microprocessor Design Workshop
EE 382M Class Notes
Foil # 44
The University of Texas at Austin
PD SOI: Differential Amplifiers
• Diff Amps behave
differently …
– Floating Body Diff
Amps suffers from
inherent Vth mismatch
due to FB effect.
– Body Tied, lowfrequency, diff amp
behave as in Bulk
– Body Tied, highfrequency diff amp can
be hard to use.
H. Sanchez, “Design Challenges of PLL, I/O, and Mixed Signal
Circuits in Advanced CMOS/SOI Technologies”, IEEE Ckt and Sys
Sept 2004 Workshop Dallas
EE 382M Class Notes
Foil # 45
The University of Texas at Austin
Clocking Issues: Body Ties in PD SOI
– Can body ties help ?
• Some … If switching signal rate is lower than time constant of contacting
the body of transistor then yes ( i.e. DC or 100’s MHz signals ).
– Charge Pump can use BT devices
• Not Much … If switching signal rate is in the GHz range.
– VCOs or high speed buffers suffer history effect with BT devices
• BT transistors have their own process sensitivities that make them
cumbersome to design with ( poly head orientation, limited device width for
effective body tie , etc ).
– PLLs have been designed using both BT and FB PDSOI transistors.
– Designing with BT transistors increases simulation time tremendously.
• Accurate BT transistor models use multiple segmented transistors to model
a single BT transistor, thus for each schematic BT device there may be an
effective 3-5 SPICE transistors used to model that single BT device.
– Decision whether to body tie or not may be more driven by faith in
ability to accurately model and deliver Si that matches model.
EE 382M Class Notes
Foil # 46
The University of Texas at Austin
Technology Issues: PD SOI
•
The BAD thing about PDSOI:
– Everything varies due to floating body
• Vth, Cgate, Id
•
The GOOD thing about PDSOI:
– Everything varies due to floating body
• Vth, Cgate, Id
•
WHAT ???
– Yes, GOOD and BAD.
• Bad if you are used to thinking that transistors behave like the “lumped”
empirical models they are based on.
• GOOD, because it only prepares you as a designer to accept variability and
learn to deal with it. Ultimately, if it’s not PDSOI, it will be some other nonclassical device that will require you to learn to “adapt”
EE 382M Class Notes
Foil # 47
The University of Texas at Austin
Future Directions: Distributed VCOs
EE 382M Class Notes
Foil # 48
The University of Texas at Austin
Future Clocking Issues
•
•
•
Reducing feature size and increasing frequency is the trend for
future high performance processors.
– Reducing feature size means bigger impact of cross die
variations.
– Interconnect delays (RC) do NOT scale well with feature size.
– RLC effects of the GCDN may require extensive analysis.
– Reducing cycle time means clock skew budget is a larger
percentage of the cycle time.
Die sizes are getting bigger
– Longer clock distribution networks resulting in increase
skews and jitter.
– Larger clock loads
Power consumption increase coupled with decreasing Vdd and
increasing noise, pose challenges to multi-GHz clocking
schemes known to be industry standards.
EE 382M Class Notes
Foil # 49
The University of Texas at Austin
Future Directions: Distributed PLL
EE 382M Class Notes
Foil # 50
The University of Texas at Austin
Future Directions:
Tuned LC tanks, Standing Wave Oscillators
•
EE 382M Class Notes
Frank O’Mahony et al ISSCC 2003
Foil # 51
The University of Texas at Austin
ISSCC 2005: Itanium2 Clock Distribution
Combine all tricks into 1 distribution: Differential, Single Ended,
Global De-Skew, Local De-Skew, Self-Determination of Frequency
Based on localized Vdd/Power/Temperature.
EE 382M Class Notes
Foil # 52
The University of Texas at Austin
ISSCC 2005: Itanium2 Clock Distribution
Digest of papers ISSCC 2005
10.1 The Implementation of a 2-core Multi-Threaded
Itanium(R) Family Processor
16.1 Clock Distribution on a Dual-core Multi-threaded
Itanium(R)-Family Processor
16.2 A 90nm Variable-Frequency Clock System for a
Power-Managed Itanium(R)-Family Processor
EE 382M Class Notes
Foil # 53
The University of Texas at Austin
ISSCC 2005: Itanium2 Clock Distribution
EE 382M Class Notes
Foil # 54
The University of Texas at Austin
ISSCC 2005: Clock Routing on Package
Digest of Papers ISSCC 2005
28.3 A Chip-Package Hybrid DLL and Clock Distribution Network for Low-Jitter Clock Delivery
Daehyun Chung1, Chunghyun Ryu1, Hyungsoo Kim1, Choonheung
Lee2, Jaedong Kim2, Jinyoung Kim2, Kicheol Bae2, Jiheon Yu2,
Seungjae Lee2, Hoijun Yoo1, and Joungho Kim1
1KAIST, Daejeon, Korea
2Amkor Technology Korea, Seoul, Korea
EE 382M Class Notes
Foil # 55
The University of Texas at Austin
ISSCC 2005: Variable Frequency Resonant Clock
Distribution
Digest of Papers ISSCC 2005
28.5 1.1 to 1.6GHz Distributed Differential
Oscillator Global Clock Network
Steven C. Chan1, Kenneth L. Shepard1, Phillip J. Restle2
1Columbia University, New York, NY
2IBM, Yorktown Heights, NY
EE 382M Class Notes
Foil # 56
The University of Texas at Austin
ISSCC 2005: Clock Skew across CELL Processor
Global Clock Distribution
The Design and Implementation of a First-Generation CELL Processor
D. Pham1, S. Asano3, M. Bolliger1, M. N. Day1, H. P. Hofstee1,
C. Johns1, J. Kahle1, A. Kameyama3, J. Keaty1, Y. Masubuchi3,
M. Riley1, D. Shippy1, D. Stasiak1, M. Suzuoki2, M. Wang1,
J. Warnock1, S. Weitzel1, D. Wendel1, T. Yamazaki2, K. Yazawa2
1IBM, Austin, TX
2Sony, Tokyo, Japan
3Toshiba, Austin, TX
EE 382M Class Notes
Foil # 57
The University of Texas at Austin
ISSCC 2005: Optical Clocking
< 1ps rms jitter
Opportunities for Optics in Integrated Circuits Applications
By D.A.B. Miller, A. Bhatnagar, S. Palermo, A. Emami-Neyestanak,
M.A. Horowitz
Digest of Technical Papers ISSCC 2005
EE 382M Class Notes
Diodes are off-chip, receiving laser pulse and
driving into mux selctor on-chip … integrated in the
future …
Foil # 58
The University of Texas at Austin
References
•
•
•
•
•
•
•
G. Geannopoulos, X. Dai - “An adaptive digital deskewing circuit for clock
distribution networks”, ISSCC Digest of Technical Papers, 1998, Pg 400 -401
S. Tam, et al, - “Clock generation and distribution for the first IA-64
microprocessor”, IEEE Journal of Solid-State Circuits, Nov. 2000, Pg 15451552
D. Harris, S. Naffziger – “Statistical clock skew modeling with data delay
variations”, IEEE Transactions on VLSI Systems, Dec. 2001, Pg 888-898
E. Friedman – “Clock distribution networks in synchronous digital integrated
circuits”, Proceedings of the IEEE, May 2001, Pg 665-692
N. Kurd, et al, - “A multi-GHz clocking scheme for the Pentium®4
microprocessor”, IEEE Journal of Solid-State Circuits, Nov. 2001, Pg 16471653
P. Restle, et al, - “The clock distribution of the Power4 microprocessor ”,
ISSCC Digest of Technical Papers, 2002, Pg 144 -145
D. Bailey, B. Benschneider - “Clocking Design and Analysis for a 600-MHz
Alpha Microprocessor”, IEEE Journal of Solid-State Circuits, Nov. 1998, Pg
1627-1632
EE 382M Class Notes
Foil # 59
The University of Texas at Austin
Key Learning’s
•
Choose your skew battles wisely
– Look at the overall skew and fix it globally if possible.
– It is okay to be loose in one area if you can gain back in
effort, productivity and better design.
– Getting the design out a quarter earlier at a slightly lower
speed wins. There is a 1% “effective” performance loss for
every week the schedule slips.
• Minimize if not eliminate interconnect matching requirements
– Fewer paths to match means fewer things can go wrong
• Make clock distribution more tolerant to design mismatch
– Don’t need to tune each path correct to the last pico-second
• Unit and block clock loading data changes till the last minute
• Final clock loadings can be up to 2.5x the original expectations
• Most practical solutions involve combination of techniques ( i.e.
global H-tree, local grid … )
EE 382M Class Notes
Foil # 60
The University of Texas at Austin
Summary
•
A successful clock distribution network should:
– Maintain low skew
– Be tolerant to design mismatches and deviations from
ideality
– Reduce and or eliminate very careful matching of layouts
– Reduce or eliminate final tuning effort right before tapeout
– Reduce / minimize design effort
– Include ability to be flexible: plan for things to go wrong, and
put in place backup solutions.
EE 382M Class Notes
Foil # 61
The University of Texas at Austin
References
•
•
•
•
•
•
P. Gronowski,et al, "High-performance microprocessor design," IEEE Journal
of Solid-State Circuits, May 1998, Pg 676 - 686
J. Maneatis - “Low-jitter process-independent DLL and PLL based on selfbiased techniques”, IEEE Journal of Solid-State Circuits, Nov. 1996, Pg 1723
–1732
I. Young, et al, - “A 0.35um CMOS 3-880MHz PLL N/2 Clock Multiplier and
Distribution Network with Low Jitter for Microprocessors”, ISSCC Digest of
Technical Papers, 1997
H. Kojima - “Half-swing clocking scheme for 75% power saving in clocking
circuitry”, IEEE Journal of Solid-State Circuits, April 1995, Pg 432 -435
T. Xanthopoulos, et al, - “The Design and Analysis of the Clock Distribution
Network for a 1.2GHz Alpha Microprocessor”, ISSCC Digest of Technical
Papers, 2001
J. Wood, et al, - “Multi-gigahertz low-power low-skew rotary clock scheme”,
ISSCCDigest of Technical Papers, 2001, Pg 400 -401
EE 382M Class Notes
Foil # 62
The University of Texas at Austin
Backup
EE 382M Class Notes
Foil # 63
The University of Texas at Austin
Acronyms
•
•
•
•
•
•
•
•
•
•
PLL – Phase Locked Loop
DLL – Delay Locked Loop
PD – Phase Detector
VCO – Voltage Controlled Oscillator
GCDN – Global Clock Distribution Network
LCB – Local Clock Buffer
GCB – Global Clock Buffer
MIM – Metal-Insulator-Metal
PD SOI – Partially Depleted SOI
BT transistor – Body-Tied transistor
EE 382M Class Notes
Foil # 64
The University of Texas at Austin
Typical 2nd order PLL Block Diagram
AVCC
C1
C2
Ref
UP
R
Charge
Bias
Pump
Gen
PFD
Fdbk
DN
VCO
VCNTL
NBIAS
1/N
Clock Network
EE 382M Class Notes
Foil # 65
The University of Texas at Austin
DLL – Delay Locked Loop
EE 382M Class Notes
Foil # 66
The University of Texas at Austin
PLL Loop Bandwidth vs. Jitter Trade-off
•
•
•
•
•
A low loop bandwidth is desirable to suppress reference clock
jitter
A high loop bandwidth is desirable to track power supply noise
In general, PLL output jitter is mainly due to power supply noise
rather than reference clock jitter (crystal oscillator based)
Minimize peaking in closed-loop response
Can use filtering to help reduce jitter due to external power
supply noise
– Dedicated power supply rail
– LC Filter
– On-die supply regulation
EE 382M Class Notes
Foil # 67
The University of Texas at Austin
PLL Analysis: Locked State
Charge
Pump
Phase/Freq.
Detector
θi(S) +
θe(S)
-
Filter
Ip
2⋅π
VCO
Ko
F(S)
Fo(S)
Divider
θfdbk(S)
EE 382M Class Notes
1
M
Foil # 68
θo(S)
2⋅π
S
The University of Texas at Austin
PLL Loop Equations
Analyzing the Transfer Function: θo(S) / θi(S)
Loop Natural Frequency (Bandwidth):
⎞
K ⋅I
⎟⋅
1
F =
2 ⋅π ⎟⎟⎠ M ⋅ C
n
⎛
⎜
⎜⎜
⎝
o
p
1
Loop Damping (First Order Filter):
ς = π ⋅ R ⋅ C1 ⋅ F n
EE 382M Class Notes
Foil # 69
The University of Texas at Austin
PLL Loop Analysis
•
Higher VCO Gain (Ko) Æ More Difficult to Stabilize Loop.
•
Loop Bandwidth is chosen based on application and stability
limitations.
•
Desirable to minimize peaking effect in closed-loop frequency
response.
•
Under-damping Æ more frequency overshoot, faster loop
response.
•
Over-damping Æless frequency overshoot, better stability,
slower loop response.
EE 382M Class Notes
Foil # 70
The University of Texas at Austin
PLL Parameters
•
•
PLL Lock Time
– This is the time it takes for the PLL to achieve steady-state
lock condition from power-up.
– Function of Loop Parameters Æ VCO gain and loop
bandwidth.
– It is determined by a one-shot measurement of the PLL
output clock when the PLL is powered-up.
– This time is important if the reference clock is dynamically
changed during operation, e.g., changing to slow speed clock
to reduce power.
PLL Tracking Range
– Determines the Min & Max VCO frequency of operation for
stable PLL loop.
– Function of VCO gain characteristics.
– Determined by sweeping the PLL input frequency and
measuring the PLL output frequency.
EE 382M Class Notes
Foil # 71
The University of Texas at Austin
PLL Parameters
•
PLL Frequency Overshoot
– It is a measure of PLL stability.
– It is determined by the Filter and the VCO Gain Parameters
– It is a one-shot measurement of the PLL output clock when
the PLL is powered-up.
• Need to limit the overshoot to prevent erroneous circuit
operation.
•
PLL Output Jitter
– Jitter performance is a strong function of PLL loop
parameters along with many other sources of noise.
– Reference clock jitter and power supply noise along with the
loop bandwidth also impact PLL output jitter
– Must consider both high and low frequency content of jitter:
• Low frequency jitter Æjitter is measured over many clock cycles.
• High frequency jitter Æ edge-to-edge variations
EE 382M Class Notes
Foil # 72
The University of Texas at Austin
VCO - Voltage Controlled Oscillator
•
There are a number of requirements placed on voltage controlled
oscillators used in clock generation circuits:
– Phase stability
– Broad tuning (tracking) range
– Linearity of frequency vs. control voltage
– Large gain factor Ko
•
These requirements are usually in direct conflict with each other.
– To obtain good wideband features phase stability will be
reduced.
EE 382M Class Notes
Foil # 73
The University of Texas at Austin
Differential VCO
Output Period = N * tdelay
Differential
Delay Stage
N = Number of Delay Stages
tdelay = Delay of 2 cascaded inverting stages
VCNTRL
+
-
+
-
+
-
+
-
+
-
NBIAS
Output
Amplifiers
- +
- +
- +
- +
- +
CMOS Signal Swing
EE 382M Class Notes
Foil # 74
The University of Texas at Austin
Differential Delay Stage
AVCC
Rvcr
Rvcr
Rvcr
Rvcr
VCNTL
OUT2
OUT1
OUT2#
OUT1#
IN1#
IN1
i
NBIAS
IN2
IN2#
Cload = Cgate + Cdiff + Cint
i
VSS
Advantages:
Differential signals more immune to noise
Insensitive to switching trip-point inaccuracy
EE 382M Class Notes
Foil # 75
The University of Texas at Austin
VCO Frequency of Oscillation
i = C ⋅ ΔV
ΔT
t delay = C ⋅ ΔV
i
Frequency =
where
i
C ⋅ΔV ⋅N
N = number of delay stages
ΔV = Peak-Peak Voltage Swing (AVCC-Vcntrl)
C = total capacitance on output node
i = input current to Ring Osc. (Tail Current)
EE 382M Class Notes
Foil # 76
The University of Texas at Austin
VCO Transfer Function
Frequency
Best Case
Ko2
Typical
Ko1
Worst Case
fmax
fmin
VCO Input Voltage Range
EE 382M Class Notes
Foil # 77
Volts
The University of Texas at Austin
PLL Block Diagram
Reference
Divider
Reference
Clock
Fin
1
N
Frequency &
Phase Locked
Post Divider
PLL Core
1
M
1
P
Fvco
⎞
M
⎟ ⋅ F in
F vco =
N ⎟⎠
⎛
⎜
⎜
⎝
⎞
M
⎟⋅ F
F =
N ⋅ P ⎟⎠
out
Feedback Divider
EE 382M Class Notes
Foil # 78
Fout
⎛
⎜
⎜
⎝
in
The University of Texas at Austin
PLL Reading List
•
•
Books:
– “Phase Locked Techniques,” F.M. Gardner, 1979.
– “Phase Lock Loops, Design,” Simulation and Applications”, R. Best, 1999.
– “Phase Lock Loop Circuit Design,” Dan Wolaver, 1991.
Articles:
– “Charge-Pump Phase-Lock Loops,” F.M. Gardner, IEEE Trans. On
Comm., Vol. Com-28. No. 11, November 1980.
– “A PLL Clock Generator with 5 to 110 MHz of Lock Range for
Microprocessors,” I.A. Young, IEEE Journal of Solid-State Circuits, Vol.
27, No. 11, November 1992.
– “Low-Jitter Process-Independent DLL and PLL Based on Self-Biased
Techniques,” John G. Maneatis, IEEE Journal of Solid-State Circuits, Vol.
31, No. 11, November 1996.
EE 382M Class Notes
Foil # 79
The University of Texas at Austin