Download L23-Clock Tree Synthe..

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of electric power transmission wikipedia , lookup

Ground (electricity) wikipedia , lookup

Buck converter wikipedia , lookup

Pulse-width modulation wikipedia , lookup

Switched-mode power supply wikipedia , lookup

Immunity-aware programming wikipedia , lookup

Power engineering wikipedia , lookup

Islanding wikipedia , lookup

Mains electricity wikipedia , lookup

Alternating current wikipedia , lookup

Flip-flop (electronics) wikipedia , lookup

Atomic clock wikipedia , lookup

Time-to-digital converter wikipedia , lookup

Transcript
L23: Clock Issues in Deep
Submircron Design(2)
1999. 10
Jun Dong Cho
Sungkyunkwan Univ. Dept. ECE
Mail : [email protected]
Homepage : vada.skku.ac.kr
1
Buffer Pre-Placement
2
Iso-Radius Buffer Insertion
( a )
( a )
( b )
( b )
3
Width Control
4
Width Tapering
5
Buffer/Load Distribution
6
H-Tree




H-Tree is a special case of CRP, where all the clock
terminals are arranged in a symmetric fashion, as is
the case in the gate arrays.
The H-Tree algorithms connects two terminals in a
particular order. Then, it connects the two middle
points of vertical segments. The connected middle
points are called the tapping points.
The H-Tree makes all terminals have the same unit
length, hence the skew in each terminals is zero.
X-Tree : If the routing is not restricted to being rectilinear, the shape of HTree can be changeable with X shape. But, it is undesirable since they may
cause crosstalk due to close proximity of wires.
7
Hierarchical Matching Tree :
MMM & GMA


MMM(Method of Means and Medians) : The MMM algorithm recursively
partitions a circuit into two equal parts, and then connects the center of the
mass of the whole circuit to the centers of mass of the two sub-circuits.
GMA(Geometric Matching Algorithm) : Unlikely MMM algorithm which is a top
down algorithm, GMA works bottom up.
Black -> Blue -> Red -> Green
Cut 1
Cut 2
Cut 3
MMM
GMA
8
Zero Skew Algorithm

Zero Skew Algorithm has recursive, bottom-up characteristics in nature.

This algorithm


Assumes that pairing of points has been done
Concerns itself with finding the tapping point very accurately, based on
capacitive loading of the clock terminals as well as the delay in the subtrees
c1
c2
r1 (  c1 )  t1  r2 (  c2 )  t2
2
2
x
T1
c1/2

If we can’t achieve the zero skew in
above method, we must elongate
one path length to make zero skew
to both path. So we call this
“snaking”.
c1/2
t1
C1
Tapping
Point
T2
c2/2
c2/2
C2
t2
x-1
9
A Worst Case Tree
10
RHMT
11
Interconnect Topology







Resistance ratio = driver resistance / unit wire resistance
when resistance ratio is small, interconnect topology
optimization is importance.
Importance metric: total wire length, radius (longest sourcesink path-length),diameter (for multi-source nets)
Optimal tree construction algorithms
BRBC(Bounded-radius bounded-cost) algorithm
A-tree algorithm: start with a forest of n single-node Atrees, repeatedly
combining two A-trees into a new one.
12
Recent Approaches
in Clock Tree Synthesis


Research in Clock Tree Synthesis Algorithm

Wire-sizing & Parallel Algorithm for zero skew

Reducing Clock Power using Multiple Voltage

Clock Tree Scheduling with Storage Retiming
Research in System Level Design Feature

GALS Clock Scheme

Considering System Level Clock Tree
13
Wire-sizing & Parallel Algorithm
for zero skew (1)



Using an iterative approach. One wire segment is selected and an alternate
wire-size is tried. To make the skew of the tree zero, we have to re-merge the
sub-tree rooted at the current wire with its sibling.
Assumption : The sibling wire uses the same wire size.
This propagation continues until all the wire segments on the path from the
current wire to the root wire are re-merged.  When, the size of a wire is
locally optimized, the effect
of the wire size change is
propagated by zero skew
merging to the root of the
clock tree.
propagation
path
W

The length of all the wires
along the propagation path
and their siblings may change
but their wire-sizes remain
unchanged.
14
Wire-sizing & Parallel Algorithm
for zero skew (2)



Sub-tree Partition : Assume
there are two processors. The
sub-tree assignment will not
occur on nodes of depth 1
since it will make an
assignment of 16 nodes and 2
nodes. But in depth of 2, we
have 4 sub-trees.
16
2
p1
p1
6
p0
8
p1
The tree is partitioned into the top part and the bottom part.

Only the nodes in the bottom part are distributed among the processors

The nodes in the top part are shared among the processors
Iteration Method.


First, let each processor do the wire-sizing for the top part.(Except root)
Each process can do the wire-sizing for all the wires in the bottom part of
the tree in a distributed manner, then synchronized the result.
15
Reducing Clock Power using
Multiple Voltage
P  f CLVddVs  f CLVdd


2
HL Converter : converts the
incoming clock signal to the chip
from high voltage swing to a
lower voltage swing.
LL Converter : regenerate the
signal and maintain a sharp slew
rate as the signal passes through
the network.


LH Converter : convert the higher
voltage swing used by logic network
at the sink FF.
Instead of using multiple voltage,
Only use reduced-swing clock
scheme.
16
Clock Tree Scheduling with
Storage Retiming




Retiming improves the speed of a digital circuit bye relocating its storage
elements while preserving the functionality of the original design.
Clock scheduling achieves the same effect as retiming by introducing
skew between the clock signals that control the timing of the storage
elements within a circuit.
When the clock skew is zero, the minimum clock period is the longest
delay of all the combinational paths in the circuit. So the goal is to
balance the longest delay of all the data paths by relocation the registers.
When nonzero clock skew is introduced, the circuit can successfully
operate at a clock period which equals the largest difference in the delays
of the slowest path and the fastest path between any pair of registers.



Left : Original circuit
Middle : Fastest retimed
circuit with zero skew
Right : Fastest retimed
circuit with nonzero skew
17
GALS Clock Scheme(1)






By now, Power consumption in Clock tree is about
50% percent of total power consumption.
In the view of system design, we must reduce the
power consumption of clock.
Power consumption in clock of large high performance
VLSIs can be reduced by adopting GALS(Globally
Asynchronous, Locally Synchronous) design style.
GALS architecture is composed of large
synchronous block(SBs) which communicate
with each other on an asynchronous basis.
By eliminating the global clock, we eliminate a
major source of power consumption and a
design bottleneck.
GALS approach is skew tolerant at global
level because it does not depend on a global
clock reference for communication, but,
gated clock will occurs clock skew when
clock frequencies go beyond GHz.
18
GALS Clock Scheme(2)



In GALS architecture, local clocks are required for the SBs.
Using global reference and PLL

The signal swing can be a fraction of Vdd.

The signal is distributed at a much lower frequency compared to the
highest frequency.

No effort is made to carefully design the geometry of the signal to
minimize skew.

Restriction : analog PLL in a noisy digital environment is difficult, and
PLL is sensitive to process variations.
Local clock generators based on ring oscillators.

The basic ring oscillator consists of an odd number of inverters in a
circular chain.

The frequency of the ring oscillator will be determined by the
propagation time through the chain of inverters.

To change the oscillation, a delay line of controllable capacitor is
used.
19
Conclusion - Low Power Issues (1)


Power optimization allows logic optimization to simultaneously optimize for
timing, area and power. So all the inputs to optimization are the same with
the addition of two new power constraints: max dynamic power and max
leakage power. A power-optimizing logic optimization system takes as
input a gate-level netlist or database, technology library, optional
constraints for timing and area, and parasitic information (initially in the
form of estimated wireloads, but if backannotation has been done that
information will be used). All that's needed in addition for power
optimization is to set a power constraint and supply switching activity - the
same switching activity used with power analysis.
What you get out of power optimization is a gate-level netlist, optimized to
meet all of your constraints. A natural question to ask is: "If optimization at
the RTL and Behavioral levels can have a great impact on final power
dissipation, why offer a gate-level power optimization capability first?" Over
a decade of experience in synthesis and optimization it has proven that
RTL level suffers the impact of optimization at the gate level. The first
commercially successful synthesis products were gate-level timing
optimizers and these paved the way for RTL and Behavioral synthesis
systems. In a similar way gate-level power optimization will pave the way
for RTL and Behavioral synthesis for low power.
20
Conclusion - Low Power Issues (1)

Earlier we made the point that analysis precedes optimization. Here we
make another general point: We might say that just as analysis precedes
optimization, optimization precedes synthesis. Or to put it another way:
Successful synthesis at higher levels requires successful optimization at
lower levels.
21
The Key Terms
in Clock Tree Synthesis













Clock buffer: circuit element to isolate and amplify incoming clock signal.
Clock tree: design technique to achieve balanced delays and loads in the clock
buffers.
Gated clock: clock line that can control clock transmission to the operating circuits.
Ground bounce: the change in ground (vss) reference levels due to current in the
ground line.
Ground loop: the noise caused in the ground line(s) due to unbalanced IR drops in
the ground line.
Insertion delay: the time from clock pad to individual flop-flops.
IR drop: the voltage drop caused by the current I through the resistor R.
Jitter: the change in period to period timing in a clock signal.
Latency: the time for a clock to become available in the circuit.
Multiphase clock: clocking system with more than one phase may be overlapping or
non-overlapping. Biphase-clock and complement, Quadrature-clocks separated by
a phase angle of 90 degree
PLL: Phase-Locked Loop, a variable frequency generator locked to a source signal.
Skew: the maximum difference in clock arrival time between any two flip-flops.
Slew rate: also called rise time or fall time. The time for a signal to go from one level
to the other level.
22
References & Suggested Readings(1)











[1] B. Schweber. Delivering The High-Speed Clock: Not Easy To Be On Time. In Proc. EDN, July 6,
1995
[2] H. B. Bakoglu. Circuits, Interconnections, and Packaging for VLSI, Addison-Wesley Publishing
Company. New York. 1990
[3] J. D. Cho and M. Sarrafzadeh. A Buffer Distribution Algorithm for High-Performance Clock Net
Optimization. In Proc. IEEE Transactions On Very Large Scale Integration (VLSI) Systems, Vol 3, No.1,
March 1993.
[4] N. C. Chou and C. K. Cheng. On General Zero-Skew Clock Net Construction. In Proc. IEEE
Transactions On Very Large Scale Integration (VLSI) Systems, Vol 3, No.1, March 1995
[5] S. Lin and C. K. Wong. Process-Variation-Tolerant Zero Skew Clock Routing. In IEEE 1993
Custom Integrated Circuits Conference. 1993
[6] B. Wiederhold. Deep submicron ASIC Design Requires Design Planning. In Proc. EDN, February
16, 1995
[7] Menezes, A. Balivada, S. Pullela and L. T. Pillage. Skew Reduction in Clock trees Using Wire
Width Optimization. In Proc. IEEE 93 Custom Integrated Circuits Conference. 1993
[8] R. Hansen and R. Deming. ASIC Design Techniques Synchronize Dual Clocks In High-Speed
Designs. In Proc. EDN, July 1993
[9] W. Khan and N. Sherwani. Zero Skew Clock Routing Algorithm For High Performance ASIC
Systems.
[10] K. D. Boese and A. B. Kahng. Zero-Skew Clock Routing Trees With Minimum Wirelength. In
IEEE 1992 Custom Integrated Circuits Conference. 1992
[11] A. Hemani, T. Meinchke, S. Kumar, A. Postula, T. Olsson, P. Nisson, J. Oberg, P. Ellervee, D.
Lundqvist.Lowering power consumption in clock by using Globally Asynchronous Locally
Synchronous design style,In Proc. DAC `99, 1999.
23
References & Suggested Readings(2)





[12] J. Rubinstein, P. Penfield, and M. A. Horowitz. Signal Delay in RC Tree Networks. In Proc. IEEE
Transactions On Computer-Aided Design, Vol. CAD-2, No.3, July 1983
[13] X. Liu, M. C. Papaefthymiou, E. G. Friedman, Maximizing Performance by Retiming and Clock
Skew Scheduling, In Proc. DAC`99 1999.
[14] J. Pangjun, S. S. Sapatnekar, Clock Distribution Using Multiple Voltages, ISLPED`99, 1999
[15] Z. Xing, P. Banerjee, A PARALLEL ALGORITHM FOR ZERO SKEW CLOCK TREE ROUTING,
International Symposium on Physical Design, 1998.
[16] J. S. Yim, S. O. Bae, C. M. Kyung, A Floorplan-based Planning Methodology for Power and
Clock Distribution in ASICs, In Proc. DAC`99, 1999.
24