Download Lecture 17 Dynamic Sequential Circuit Timing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
CMPEN 411
VLSI Digital Circuits
Spring 2012
Lecture 17: Dynamic Sequential Circuits
And Timing Issues
[Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, ©2003
J. Rabaey, A. Chandrakasan, B. Nikolic]
Sp12 CMPEN 411 L17 S.1
This Lecture

Reading
 Dynamic sequential circuits
- Reading assignment – Rabaey, et al, 7.3, 7.7

Timing issues, Intro to datapath design
- Reading assignment – Rabaey, et al, 10.1-10.3.3; 11.1-11.2

Next lecture

Intro to datapath design
- Reading assignment – Rabaey, et al, 11.1-11.2

Adder design
- Reading assignment – Rabaey, et al, 11.3
Sp12 CMPEN 411 L17 S.2
Last Lecture: Static MS ET Implementation
Slave
Master
I2
T2
I3
I5
T4
I4
T3
QM
D
I1
T1
clk
clk
!clk
Sp12 CMPEN 411 L17 S.3
I6
Q
Dynamic ET Flipflop
master
slave
!clk
D
clk
T
I1
1
QM
T
master transparent
slave hold
Q
2
C
clk
I2
1
C
!clk
2
tsu = tpd_tx
thold = zero
tc-q = 2 tpd_inv + tpd_tx
clk
!clk
Sp12 CMPEN 411 L17 S.4
master hold
slave transparent
Pseudostatic Dynamic Latch

Robustness considerations limit the use of dynamic FF’s




coupling between signal nets and internal storage nodes can
inject significant noise and destroy the FF state
leakage currents cause state to leak away with time
internal dynamic nodes don’t track fluctuations in VDD that
reduces noise margins
A simple fix is to make the circuit pseudostatic
clk
QM
Q
!clk

Add above logic added to all dynamic latches
Sp12 CMPEN 411 L17 S.5
Dynamic ET FF Race Conditions
!clk
D
clk
T
I1
1
QM
T
clk
!clk
Sp12 CMPEN 411 L17 S.6
Q
2
C
clk
I2
1
C
!clk
2
0-0 overlap race condition
toverlap0-0 < tT1 + tI1 + tT2
1-1 overlap race condition
toverlap1-1 < thold
Fix 1: Dynamic Two-Phase ET FF
Keep clock non-overlap large enough, but with 4 clock singals to route
clk1
clk2
T
D
I1
1
QM
T
Q
2
C
!clk1
I2
1
C
!clk2
2
master transparent
slave hold
clk1
tnon_overlap
clk2
master hold
slave transparent
Sp12 CMPEN 411 L17 S.7
Fix 2: C2MOS (Clocked CMOS) ET Flipflop

A clock-skew insensitive FF
Master
Slave
M2
clk
M4
D
!clk
M3
M1
clk
!clk
Sp12 CMPEN 411 L17 S.8
M6
QM
C1
!clk
M8
Q
clk
M7
M5
C2
C2MOS (Clocked CMOS) ET Flipflop

A clock-skew insensitive FF
Master
Slave
M2
clk
Mon
4
off
D
!clk
Mon
3
off
M1
master transparent
slave hold
QM
C1
!clk
clk
Moff
8
on
Q
Moff
7
on
C2
M5
clk
!clk
Sp12 CMPEN 411 L17 S.9
M6
master hold
slave transparent
C2MOS FF 0-0 Overlap Case

Clock-skew insensitive as long as the rise and fall times
of the clock edges are sufficiently small
M2
0
M4
D
M6
0
QM
M8
Q
C1
C2
M1
M5
clk
clk
!clk
!clk
Sp12 CMPEN 411 L17 S.10
C2MOS FF 1-1 Overlap Case
M2
M6
QM
D
1
M3
Q
C1
1
M1
M5
clk
clk
!clk
!clk
Sp12 CMPEN 411 L17 S.11
M7
C2
Fix 3: True Single Phase Clocked (TSPC) Latches
Negative Latch
In
clk
Positive Latch
clk
Q
hold when clk = 1
transparent when clk = 0
Sp12 CMPEN 411 L17 S.12
In
Q
clk
clk
transparent when clk = 1
hold when clk = 0
Embedding Logic in TSPC Latch
A
PUN
B
Q
In
clk
clk
PDN
Q
clk
A
B
Sp12 CMPEN 411 L17 S.13
clk
TSPC ET FF
Master
D
clk
on
off
clk
master transparent
slave hold
clk
Sp12 CMPEN 411 L17 S.14
Slave
on
off QM
on
clk off
on
clk
off
Q
master hold
slave transparent
Choosing a Clocking Strategy

Choosing the right clocking scheme affects the
functionality, speed, and power of a circuit

Two-phase designs




+ robust and conceptually simple
- need to generate and route two clock signals
- have to design to accommodate possible skew between the
two clock signals
Single phase designs




+
+
+
-
only need to generate and route one clock signal
supported by most automated design methodologies
don’t have to worry about skew between the two clocks
have to have guaranteed slopes on the clock edges
Sp12 CMPEN 411 L17 S.15
Review: Sequential Definitions

Use two, level sensitive latches of opposite type to build
one master-slave flipflop that changes state on a clock
edge (when the slave is transparent)

Static storage

static uses a bistable element with feedback to store its state and
thus preserves state as long as the power is on
- Loading new data into the element: 1) cutting the feedback path (mux
based); 2) overpowering the feedback path (SRAM based)

Dynamic storage


dynamic stores state on parasitic capacitors so the state held for
only a period of time (milliseconds); requires periodic refresh
dynamic is usually simpler (fewer transistors), higher speed, lower
power but due to noise immunity issues always modify the circuit
(by adding a feedback loop on the output) so that it is pseudostatic
Sp12 CMPEN 411 L17 S.16
Timing Classifications

Synchronous systems


All memory elements in the system are simultaneously updated
using a globally distributed periodic synchronization signal (i.e.,
a global clock signal)
Functionality is ensure by strict constraints on the clock signal
generation and distribution to minimize
- Clock skew (spatial variations in clock edges)
- Clock jitter (temporal variations in clock edges)

Asynchronous systems



Self-timed (controlled) systems
No need for a globally distributed clock, but have asynchronous
circuit overheads (handshaking logic, etc.)
Hybrid systems


Synchronization between different clock domains
Interfacing between asynchronous and synchronous domains
Sp12 CMPEN 411 L17 S.17
Review: Synchronous Timing Basics
R1
In
clk

D Q
tclk1
tc-q, tsu,
thold, tcdreg
R2
Combinational
logic
D Q
tclk2
tplogic, tcdlogic
Under ideal conditions (i.e., when tclk1 = tclk2)
T  tc-q + tplogic + tsu
thold ≤ tcdlogic + tcdreg

Under real conditions, the clock signal can have both
spatial (clock skew) and temporal (clock jitter) variations


skew is constant from cycle to cycle (by definition); skew can be
positive (clock and data flowing in the same direction) or negative
(clock and data flowing in opposite directions)
jitter causes T to change on a cycle-by-cycle basis
Sp12 CMPEN 411 L17 S.18
Sources of Clock Skew and Jitter in Clock Network
4 power supply
3 interconnect
6 capacitive load
clock
1
generation
PLL
7 capacitive
coupling
2 clock drivers
5 temperature

Skew



manufacturing device
variations in clock drivers
interconnect variations
environmental variations
(power supply and
temperature)
Sp12 CMPEN 411 L17 S.19

Jitter



clock generation
capacitive loading and
coupling
environmental variations
(power supply and
temperature)
Positive Clock Skew

Clock and
data flow in
the same
direction
R1
In
R2
Combinational
logic
D Q
D Q
tclk1
clk
tclk2
T
1
>0
2
delay
T+
3
4
 + thold
T:
T +   tc-q + tplogic + tsu so T  tc-q + tplogic + tsu - 
thold :
thold +  ≤ tcdlogic + tcdreg so thold ≤ tcdlogic + tcdreg - 

 > 0: Improves performance, but makes thold harder to
meet. If thold is not met (race conditions), the circuit
malfunctions independent of the clock period!
Sp12 CMPEN 411 L17 S.20
Negative Clock Skew

Clock and
data flow in
opposite
directions
R1
In
R2
D Q
Combinational
logic
tclk1
D Q
tclk2
delay
T
T+
1
2
<0
3
4
T:
T +   tc-q + tplogic + tsu so T  tc-q + tplogic + tsu - 
thold :
thold +  ≤ tcdlogic + tcdreg so thold ≤ tcdlogic + tcdreg - 

 < 0: Degrades performance, but thold is easier to meet
(eliminating race conditions)
Sp12 CMPEN 411 L17 S.21
clk
Clock Jitter

Jitter causes T to
vary on a cycle-bycycle basis
R1
Combinational
logic
In
tclk
clk
T
-tjitter
T:

+tjitter
T - 2tjitter  tc-q + tplogic + tsu so T  tc-q + tplogic + tsu + 2tjitter
Jitter directly reduces the performance of a sequential
circuit
Sp12 CMPEN 411 L17 S.22
Combined Impact of Skew and Jitter
Constraints
on the
minimum
clock period
( > 0)

R1
In
R2
Combinational
logic
D Q
D Q
tclk1
tclk2
T
1
T+
>0
6
12
-tjitter
T  tc-q + tplogic + tsu -  + 2tjitter

thold ≤ tcdlogic + tcdreg –  – 2tjitter
 > 0 with jitter: Degrades performance, and makes thold
even harder to meet. (The acceptable skew is reduced
by jitter.)
Sp12 CMPEN 411 L17 S.23
Clock Distribution Networks

Clock skew and jitter can ultimately limit the performance
of a digital system, so designing a clock network that
minimizes both is important



In many high-speed processors, a majority of the dynamic power
is dissipated in the clock network.
To reduce dynamic power, the clock network must support clock
gating (shutting down (disabling the clock) units)
Clock distribution techniques

Balanced paths (H-tree network, matched RC trees)
- In the ideal case, can eliminate skew
- Could take multiple cycles for the clock signal to propagate to the
leaves of the tree

Clock grids
- Typically used in the final stage of the clock distribution network
- Minimizes absolute delay (not relative delay)
Sp12 CMPEN 411 L17 S.24
H-Tree Clock Network

If the paths are perfectly balanced, clock skew is zero
Clock
Can insert clock gating at
multiple levels in clock tree
Can shut off entire subtree
if all gating conditions are
satisfied
Idle
condition
Clock
Sp12 CMPEN 411 L17 S.25
Gated
clock
Clock Grid Network

Distributed buffering reduces absolute delay and makes
clock gating easier, but is sensitive to variations in the
buffer delay
 The secondary buffers
isolate the local clock
nets from the upstream
local logic
load and amplify the
area
clock signals degraded
by the RC network
Clock

main clock
buffer


secondary clock buffers
Sp12 CMPEN 411 L17 S.26
decreases absolute skew
gives steeper clocks
Only have to bound the
skew within the local
logic area
DEC Alpha 21164 (EV5) Example

300 MHz clock (9.3 million transistors on a 16.5x18.1
mm die in 0.5 micron CMOS technology)


single phase clock
3.75 nF total clock load

Extensive use of dynamic logic

20 W (out of 50) in clock distribution network

Two level clock distribution



Single 6 inverter stage main clock buffer at the center of the
chip
Secondary clock buffers drive the left and right sides of the
clock grid in m3 and m4
Total equivalent driver size of 58 cm !!
Sp12 CMPEN 411 L17 S.27
Secondary Clock Buffers
Sp12 CMPEN 411 L17 S.28
Clock Skew in Alpha Processor


Absolute skew smaller than 90 ps
The critical
instruction and
execution units all
see the clock within
65 ps
Sp12 CMPEN 411 L17 S.29
ASIC example
Sp12 CMPEN 411 L17 S.30
Microprocessor example
Sp12 CMPEN 411 L17 S.31
Dealing with Clock Skew and Jitter

To minimize skew, balance clock paths using H-tree or
matched-tree clock distribution structures.

If possible, route data and clock in opposite directions;
eliminates races at the cost of performance.

The use of gated clocks to help with dynamic power
consumption make jitter worse.

Shield clock wires (route power lines – VDD or GND – next to
clock lines) to minimize/eliminate coupling with neighboring
signal nets.

Use dummy fills to reduce skew by reducing variations in
interconnect capacitances due to interlayer dielectric
thickness variations.

Beware of temperature and supply rail variations and their
effects on skew and jitter. Power supply noise fundamentally
limits the performance of clock networks.
Sp12 CMPEN 411 L17 S.32
Clock Skew Scheduling
12
16
Minimum clock period
with zero skew
16
16
A
Sp12 CMPEN 411 L17 S.33
12
C
Clock Skew Scheduling
i
j
k
12
16
1
T = 15
pulse at i,k
tight
pulse at j
max
Sp12 CMPEN 411 L17 S.34
16
C
12
Clock Scaling
Sp12 CMPEN 411 L17 S.35
Next Lecture and Reminders

Next lecture

Intro to datapath design
- Reading assignment – Rabaey, et al, 11.1-11.2

Adder design
- Reading assignment – Rabaey, et al, 11.3
Sp12 CMPEN 411 L17 S.36