Download reducing power and latency in 2-d mesh using gpls clocking

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Time-to-digital converter wikipedia , lookup

Transcript
Reducing Power and Latency in 2-D MESH NoCs Using GPLS
(Globally Pseudochronous Locally Synchronous) CLOCKING
Harshal Ved
200501020
Ref:- Paper by Erland Nilsson and Johny Oberg
Introduction

In high performance VLSIs, clock net is the cause of two
problems :


It is a major source for power consumption .
Keeping Clock skews within tolerable limits is a design bottleneck.
Solution :
Partition the design into large synchronous blocks that
communicate globally asynchronously using handshake signals[1].
Previous Work

Three ways to reduce power consumption :1.
2.
3.

Reducing Voltage
Reducing Physical Capacitance
Reducing Activity
Third category is seen as having great potential for
the purpose .
2-D mesh NoC

Example of a 2-D mesh NoC showing Switches (S) and Resources (R) with
their interconnections.


Approximately 70% of the power is burned in clock
distribution and Latches [2] .
Solution to above problem :

GALS (Globally Asynchronous Locally synchronous)
clocking.
Mesochronous clocking .
Drawbacks of Globally synchronous operations [3]



Large Peak current at clock edge, leading to ground
bounce and voltage drops, which in turn induce jitter
in both clock and data.
Very difficult to match the delay in different braches
of global tree.
Globally synchronous systems are not scalable.
GALS (Globally Asynchronous Locally synchronous)


GALS architecture is composed of large synchronous blocks
which communicate with each other on an asynchronous
basis but communicate internally on synchronous basis.
Disadvantage with GALS :


The asynchronous communication between the clock regions must
be controlled with handshake regions. This leads to reduced
maximum frequency and an increased area overhead.
Limitations in availability of design tools .[3]
Lack of global clock level. [4]
Mesochronous Clocking [3]

Mesochronously clocked systems employ a single clock across the
entire system, but at an arbitrary phases.

Advantages :


Power dissipation in the clock distribution network is significantly
reduced .
Mesochronous systems are scalable .
Disadvantage :

Nothing can be said concerning the phase alignment between
clocks in different parts of the system.
Thus metastability may occur when passing data between clock
phase domains.
GPLS (Globally Pseudochronous Locally
synchronous)


Pseudochronous is for short for pseudo-synchronous which is
mesochronous clock with constant phase difference between local clock
regions.
Distribution of clock is done between the switches.
Limitation:
In both the cases i.e. GPLS and Mesochronous clocking system, the
clock distribution layout benefits only if there is regular topology with
constant distance between each node/region .
Various clocking Methods
GPLS Vs Mesochronous

For Mesochronous NoC, data is forwarded without
any concern of phase.

If clock arrives (almost) simultaneously with data, there
might be problem with metastability with latches.
In the pseudochronous case, both the frequency and the
phase are constant. If we select the phase constant
carefully, we can guarantee that data always arrive slightly
before the clock for some paths.

Pseudochronous NoC Clocking



Every switch has four outgoing and four incoming connections to the
surrounding switches and is able to switch packets in all four
directions in one cycle.
The network uses hot-potato routing and a complete 128 bit packet is
sent in parallel in one clock cycle. Hot-potato routing leads to that no
packet is queued in a switch if the output that aims towards the
destination is not available.
By selecting the phases of the switch nodes, the communication in
certain paths is made with lower latency compared to a case if every
switch had identical phase. We call such path a data motorway (DM).



The phase difference w.r.t. the reference clock, i.e. the clock
source, is increased every time the clock is forwarded to the
next node.
The clock period is divided into M phases which gives the
minimum phase difference.
For example, M = 4 means that fourdifferent phases are
used across the chip, M = 1 is equivalent to the synchronous
case.
Example of clock distribution for 4X4 mesh with four constant phases (M=4)

Phase difference between two neighbouring switches is given by:m (T∆/Tperiod)
Power Analysis

There are two issues when it comes to reducing the
power consumption
1.
2.
Reducing the average power consumption.
Reducing the peak power consumption.
Average Power Analysis



Reducing the average power means that the overall power
consumption is reduced, i.e., less energy is spent per clock cycle.
The power consumption of a block of random logic with n gates can be
estimated as
Pavg = (Ksw.n.Cld.Vp^2)/tclk
Since the average power is dependent on the frequency and the
amount of logic, the only affect our clock phasing scheme can have on
a design is how the clock is distributed, i.e., if it reduces the amount
of switched capacitance on the clock wires.
Peak Power Analysis



Reducing the Peak power means that the amount of by-pass
capacitance that is needed on-chip to even out the switching
current can he reduced.
Local bypass capacitances are placed closed to the gates to
counter power supply noise.
Extra capacitances deliver extra current. Thus, reducing peak
power reduces the power supply noise and thus the clock
jitter in the circuit. [5]
Peak Power Analysis
Peak Power triangles, a) single peak h) two peakswith a phase
difference, e) two peaks with a phase difference of180 degrees (M=2),
d) four peaks with a phase difference of 90degrees (M=4).
Conclusions:



In GPLS clock distribution peak power is halved which is
nearly equal to the average power.
The power supply noise and clock jitter have been reduced.
Another benefit by forwarding the clock over data lines
is that it can save one metal layer that traditionally is
used for global clock distribution.
References





[1] http://citeseer.ist.psu.edu.html
[2] C. Anderson. Physical design of a fourth-generation POWER GHz
microprocessor. In Solid-State Circuits Conference, 2001.
[3] Tobias Bjerregaard, A Scalable, Timing-Safe, Network-on-Chip
Architecture with an Integrated Clock Distribution Method.
[4] T. Bjerregaard and J. Sparsø. A scheduling discipline for latency and
bandwidth guarantees in asynchronous network-on-chip.
[5] J Oberg. Nenvorkr on Chip, chapter Clocking Strategies
forNetworks-on-Chip. Kluwer Academic Publishers, 2003.
Work done so far




Kun Huang, Jun Wang and Ge Zhang. An Innovative Power-Efficient
Architecture for Input Buffer of Network on Chip.
A Scalable, Timing-Safe, Network-on-Chip Architecture with an
Integrated Clock Distribution Method
F. Mu and C. Svensson. Self-tested self-synchronization circuit for
mesochronous clocking. IEEE Transactions on Circuits and Sys-tems II:
Analog and Digital Signal Processing, 48:129–140, 2001.
B. Mesgarzadeh, C. Svensson, and A. Alvandpour. A newmesochronous
clocking scheme for synchronization in SoC. In Pro-ceedings of the
2004 International Symposium on Circuits and Sys-tems (ISCAS ’04),
pages 605–608. IEEE, 2004.
THANK YOU