Download Dynamic Logic Families

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Buck converter wikipedia , lookup

Opto-isolator wikipedia , lookup

Time-to-digital converter wikipedia , lookup

TRIAC wikipedia , lookup

Control system wikipedia , lookup

Curry–Howard correspondence wikipedia , lookup

Rectiverter wikipedia , lookup

Flip-flop (electronics) wikipedia , lookup

Digital electronics wikipedia , lookup

CMOS wikipedia , lookup

Transcript
Dynamic Logic Families
C.K. Ken Yang
UCLA
[email protected]
Courtesy of MAH,JR
EE 215B
1
Overview
•
•
Reading
– Rabaey 6.3 (Dynamic), 7.5.2 (NORA)
Overview
– This set of notes cover in greater detail Dynamic Logic
Families and in particular Domino Logic. There is an
extensive discussion on the noise issues in dynamic circuits
and how they are resolved. A few variants of domino logic
are introduced.
EE 215B
2
Domino Logic Family Outline
•
Dynamic/domino logic
– Domino logic
– Timing of domino logic
– Noise issues and keepers
•
Dual-rail domino logic (Dynamic DCVS) and other domino styles
EE 215B
3
Review: Pre-charged Logic (1)
•
We saw before that pseudo-nMOS logic’s main disadvantage was the static
current that it consumes. One way to get rid of it is to build a dual pMOS stack
to cut this static current path (CMOS). Another approach to eliminate this static
current is pre-charging.
What do you mean by pre-charging?
– Before each evaluation phase, pre-charge the output high
– Execution of Boolean expression either discharges output or leaves it high
• A single low-to-high transition on the input allowed, but NOT a high-to-low
transition during evaluation
static
current
Dual pMOS
Network
precharge
A
B
evaluate
Psuedo-nMOS
CMOS
non-overlapping
(good, but not always
possible)
precharge
evaluate
precharge
evaluate
Pre-Charge
EE 215B
4
Review: Precharged Logic (2)
•
•
clk
•
Implement the logic function with nMOS pull-down stack as in pseudo-nMOS
Can use a single clock signal  = pre-charge=evaluate
clk
clk
clk
These gates cannot be cascaded, even if complementary clocks are used for
alternating stages
– Constrained by low-to-high transition requirement at the input during
evaluation
– Need to put an inverting stage between them  Domino Logic
EE 215B
5
Review: Domino Logic
precharged
node
rises monotonically
clk
clk
X
This can be any static
CMOS gate (NAND,
NOR, etc.)
•
•
During pre-charge:
– Output of dynamic stage (X) “pre-charged” high when clk is low
– Domino gate output driving input of another always low during pre-charge
During evaluate:
– X is conditionally discharged during evaluation
– Output of static buffer rises monotonically
– Inverting gate can be any inverting static CMOS gate
– It is impossible for buffer output to go from H-to-L during evaluation
EE 215B
6
Review: Domino Chains

nMOS
nMOS
nMOS

•
•
•
•
•
Cascaded gates can be switched from PRECHARGE to EVAL on the same
clock edge
– Logic decisions propagate through the cascade (or chain) like a row of
falling dominos
Length of domino chains is limited by EVAL time
– Logic must propagate to the output before  falls
Inputs to domino stage must be held stable during EVAL
Domino gates are ratioless
All domino gates are NONINVERTING (no XOR function)
EE 215B
7
Review: Delay in Domino Circuits
clk
16
8
8
16
8
4
8
•
•
4
Eliminating fat slow pMOS transistors allows less input capacitance for
same drive strength (lower logical effort)
– Less input capacitance for same drive strength
– Reduces diffusion capacitances
Domino gate has lower switching threshold, so it starts switching
sooner
– No contention between pull-up and pull-down
EE 215B
8
Review: Logical Effort of Dynamic Gates
3
•
•
3
2
3
2
LE= 1
2
LE=2/3
What about the foot transistor?
– Does it need to be sized the same?
– NAND structure might not need a footing transistor.
EE 215B
9
Review: Precharged NAND Decoder
•
•
Generally Built with NAND gates
– If you don’t use clocked transistors
– Can get lower logical effort
If we used NAND gates with skewed
inverters afterward
– Assume inputs are pulses
– Average Logical Effort is Sqrt(2/3 *
5/6) = 0.75
EE 215B
CLK
W/2
4W
2W
2W
W
10
Monotonic Edge Optimization
•
•
Care most about evaluation speed, so skew static gate to favor input
falling edge (output rising edge)
– Use high-skewed CMOS gates (pMOS >> nMOS)
– Caveats: degraded noise margins, slower pre-charge time
Structuring logic into dynamic and static gates is an art form
– Static gate favors NAND (since series pMOS slow)
– Dynamic stage allows more series devices
clk
16
16
8
8
dyanmic stage
static stage
EE 215B
11
Clocked Evaluation Transistor
The clocked evaluation transistor is not strictly necessary.
• Can remove if all the inputs are provably low during pre-charge
– Other domino gate outputs satisfy this condition
• Also okay if high inputs are in series with provably low input
• Delay pre-charge edge to reduce power burned at start of pre-charge
clkd
clk
clk
L
H
clkd
L
H
clkdd
clkdd
clk
clkd
clkdd
EE 215B
12
Pre-charge Properties
•
Many domino gates can evaluate in one half-cycle, so it should be easy
to pre-charge a single domino gate in the other half-cycle. But…
– The domino gate must pre-charge enough to flip the high skew
gate, then the high skew gate must fall below Vt by sufficient noise
margin before evaluation starts again
– To speed up domino evaluation, we want a small pre-charge
transistor (small diffusion parasitic capacitances)
• Makes pre-charge slow
• High skew gate falls very slowly
•
– Delaying the clock to avoid pre-charge contention in un-clocked
pull-down stacks reduces pre-charge time for clkdd domino gate
– Cycles are getting shorter
– Advanced domino methodologies are stretching the length of
evaluation phase at the expense of pre-charge time
Bottom line: pre-charge time is becoming an important issue. Size for
roughly equal pre-charge and evaluate times
EE 215B
13
Domino Logic Family Outline
•
Dynamic/domino logic
– Domino logic
– Timing of domino logic
– Noise issues and keepers
•
Dual-rail domino logic (Dynamic DCVS) and other domino styles
EE 215B
14
Clocking for Domino Circuits (1)
•
Make sure that the half-cycle during pre-charge is not wasted.
– Use clk for one domino chain, and clk_b for the 2nd domino chain.
– Data transfers from one phase (chain) to the next.
– Need a latch between the phases since data is gone during precharge.
• If pre-charge comes early, we may lose the data.
clk
clk_b
Clk_b
Clk_b
Latch
domino
Static
Clk_b
domino
Static
domino
Latch
domino
Static
domino
Static
domino
Clk_b
Clk
Clk
Clk
Clk
Legend:
Domino: One inverting dynamic gate
Static: One inverting static gate
Latch: Inverting tristate latch
EE 215B
15
Source: D. Harris
Clocking for Domino Circuits (2)
•
Domino doesn’t look so attractive in the context of a traditional
pipeline.
– Pay clock skew twice in each cycle.
– Balancing short phases is difficult since there is no time
borrowing.
– Latches become a significant fraction of the cycle time.
clk
clk_b
Clk_b
Clk_b
Latch
domino
Static
Clk_b
domino
Static
domino
Latch
domino
Static
domino
Static
domino
Clk_b
Clk
Clk
Clk
Clk
Legend:
Domino: One inverting dynamic gate
Static: One inverting static gate
Latch: Inverting tristate latch
EE 215B
16
Source: D. Harris
Domino-clocking Evaluation
•
Let T = cycle time = 16 FO4 delays; tskew = 2; tsetup = 1
•
Difficult filling cycle exactly (no time borrowing) -> timbalance = 1
•
Tphase-logic = T/2 - tskew - tsetup - timbalance
•
Baseline Design:
– Tphase-logic = ______________________
– 50% of the phase is wasted in overhead! Slower than static!
•
Optimized Design:
– Define clock domains and use tskew-local = 1
– Work hard to balance logic between phases: timbalance = 0
(optimistic)
– Tphase-logic = _____________________
– Still, 25% of the phase is overhead!
EE 215B
17
Source: D. Harris
Early Enhancements
•
Good designers have recognized this problem for years.
•
The largest problem is the hard edges set by the latches.
•
A variety of latches soften this edge:
from domino


SR Latch
Dual-Monotonic Latch TSPC Latch
EE 215B
18
Source: D. Harris
Skew-tolerant Domino Clocking
•
How much clock skew could we tolerate given N clock phases?
– Divide logic into N phases of T/N duration each.
– Overlapping clocks eliminates need for latches
– Extra overlap accommodates clock skew and time borrowing
– As with other domino techniques, budget skew on the
transition from static to domino
1
2
1
1
1
1
2
2
2
static
domino
static
domino
static
domino
static
domino
static
domino
static
domino
static
domino
static
domino
EE 215B
2
19
Skew Tolerance
•
T = te + tp
•
tp = tprech + tskew; te = T/N + tskew + thold
•
Hence tskew-max = [T(N-1)/N - tprech - thold] / 2
1
2
tp
1a
1b
te
must overlap
by thold
2a
1a
1b
Effective Precharge Window
2a
static
domino
static
domino
static
domino
EE 215B
20
Time Borrowing
•
If we overlap the phases some more, we can provide a region where we
can allow “time-borrowing” between the phases.
– Both phases are high for longer period of time.
– Helps with logic granularity.
tborrow  toverlap  thold  t skew
EE 215B
21
Numerical Example
•
Assume that Tcycle=16
•
Let tprech = 4, long enough to:
– Precharge domino gate
– Make subsequent skewed static fall below
Vt
•
thold is slightly negative for reasonable
cell libraries
N
tskew
tp
2
2
6
3
3.33
7.33
4
4
8
6
4.66
8.66
8
5
9
– Next phase can evaluate before
precharge ripples through static gate
– Conservatively bound thold at 0
– Sweet spots: N=2 (fewest clocks), N=4
(good tolerance, 50% duty cycle)
EE 215B
22
Aside: 4-Phase Skew-Tolerant Domino
•
•
Don’t need to worry about data flowing through from 1-2-3-4
within 1 cycle.
– No min-delay constraint.
Lots of overlap for skew tolerance and time borrowing.
EE 215B
23
Some Design Issues
•
State is no longer stored in the latch at the end of a phase
– Instead, it is held by the first domino gate in the phase
– Use a “full keeper” to allow stop-clock operation
from 1 block
2
•
weak
All systems with overlapping clocks require min-delay checks
– Domino paths are presumably critical anyway, so few mindelay errors
– 4-phase has effectively no min-delay risk
• Overlap of all four phases is at most very small
• A minimum of 8 gates are in the cycle anyway
EE 215B
24
Pulse Stretching and Shrinking
•
•
Stretch pulses by 2 inverter delays
using an even number of inverters.
– Input transitions HIGH
– Output stays HIGH (inverted) after
the 2 inverter delay.
Create a pulse with only 3 inverter
delay pulse-width.
– Input transitions HIGH
– Both inputs are HIGH (output LOW)
for 3 inverter delays
0
1
1
0
2
2
Each tick=tinv
EE 215B
25
Multiphase Clock Generation
•
•
•
2
ckin
¼ tper
Clock
complement
¼ tper
Pulse widen
Pulse widen
•
1
Pulse widen
•
Generating precisely shaped
clocks is not easy.
Fortunately, it doesn’t need to
be terribly precise.
2-phase clocking
– 1 and 2 are nonoverlapping.
– In this design, length of ck
non-overlap does not scale
with frequency.
Use pulse stretchers to
guarantee overlap.
– Control overlap with
inverters.
4-phase clocking often need
well-controlled delay lines.
3
4
2
1
EE 215B
26
Example: “2”-Phase Time Borrowing
•
Time borrowing in the Itanium (Rusu00)
– Use 4 clock phases
– Clkd overlaps with both clkb and clk to allow borrowing
between Phase 1 and Phase 2.
• Instead of requiring exactly 180o overlapping clocks
EE 215B
27
N-phase Skew-Tolerant Domino
•
•
The idea is to delay the clock along with the data flow.
Can’t delay by too much (>Tcycle/2 in case (a) >Tcycle in case (b))
would cause improper timing.
– Last phase (6) needs to arrive before the next 1 arrives.
– Phases are not necessarily uniform.
EE 215B
28
Interfacing with Static Logic (1)
•
•
•
When domino output is driven to a static logic.
Pre-charge phase must be eliminated.
Follow the pre-charge gate with the latch (Itanium 2)
– Evaluates low when clock transitions HIGH.
– When pre-charge data (X) evaluates, output transitions HIGH (or stays
LOW).
– Stays stable during pre-charge because latch is non-transparent when
clock is LOW.
EE 215B
29
Interfacing with Static Logic (2)
•
•
•
When a static logic outputs are driven to the
first domino stage.
Capture the data with a F/F or latch so that
the data do not transition during Evaluate.
– Or in some way so that only rising edges
are allowed.
Ultrasparc/Itanium 2 both use a latch that
only allows the output to transition from L-H.
– The latch is pulsed.
• Only conducting LOW for 3 inverter delay
time.
– “A”-input arrives before the rising edge is
latched.
– Rising edge “A”-input that arrives during
the pulse is also latched.
• This essentially gives a small degree of
time borrow.
EE 215B
30
Domino Logic Family Outline
•
Dynamic/domino logic
– Domino logic
– Timing of domino logic
– Noise issues and keepers
•
Dual-rail domino logic (Dynamic DCVS) and other domino styles
EE 215B
31
Noise in Domino Design #1: Charge Leakage
Out
CLK
A

Subthreshold leakage
Junction leakage
Evaluate
VOut
Precharge
Minimum clock rate on the order of kHz
EE 215B
32
Noise in Domino Design #2: Coupling and Gnd
Bounce
Coupling
Ground Bounce
high skew gate
1
1
1
Vt
•
•
•
•
The output of a dynamic gate is a floating node
Coupling on the dynamic node can cause the static gate to glitch
Input glitches can discharge dynamic node
– Portion of glitch >Vt is important
Ground bounce can cause a glitch or turn on the nMOS pull down
EE 215B
33
Noise in Domino Design #3: Backgate Coupling
out1
A
out2
B
in

Dynamic NAND
Static NAND
3
2
out1
1

0
out2
in
-1
0
2
Time, ns
EE 215B
4
6
34
Domino Noise Margin: Keepers
weak
minimum
long
•
•
•
•
•
Keeper for tiny
domino gates
Dynamic output may be corrupted by subthreshold leakage, -particles
Use a weak keeper to make the dynamic node static
Keeper doesn’t help much with charge sharing and output coupling b/c
it is so small
– Also degrades evaluation speed
Prefer separate inverter for keeper
– Allows complex static gates, minimizes noise coupled onto keeper
“Dual-gate” keeper minimizes load on tiny gates
EE 215B
35
Delayed Keepers
•
•
Weakened keepers are not as effective at restoring the degraded
voltage.
– To avoid fighting, we can turn on a stronger keeper after a small
delay. (Alvandpour02), (Allam01), (Jung01)
– In (b), x floats momentarily.
Key is to not delay by too much.
– Restore before too much charge is gone.
– But not start the keeper before all the inputs have arrived.
– Works best with the static logic interface (when all inputs are
stable).
EE 215B
36
Issue in Domino Design #5: Charge Sharing
•
Domino designs often fail due to charge sharing if internal nodes are not
considered
– Occurs when internal node was low; capacitance divider with output formed
– Reduce charge sharing by reducing capacitance of internal nodes relative to
capacitance of load
• High fanout gates suffer least from charge sharing
– Pre-charge internal nodes where necessary with “secondary pre-charge
devices” (generally, every other node suffices)
clk
clk
out
in
0
Cout
x
in
goes to high
skew gate
out
Cx
x
let Cx = Cout
EE 215B
37
Pre-charging Internal Nodes
•
•
Normally, internal nodes are pre-charged with small pMOS
devices
– Not crucial to get node to 100% of Vdd, just reduce noise
Gates actually run faster when some charge sharing occurs
– Less capacitance needs to be pulled all the way down
– Sometimes pre-charge an internal node to Vdd-Vt with
an nMOS device
– Maybe even pre-discharge an internal node to speed it up
• Worst case for speed is with node high, worst case for noise
is with node low
• If we can tolerate the noise with node low, we might improve
the speed by guaranteeing the node is low
A
• Use small nMOS device (make sure it is off during
evaluation)
B
• Only can pre-discharge a node if no path to Vdd possibly
exists
2
• Must be sure that noise is tolerable for all cases when
doing this!
EE 215B
O
38
Domino Pitfalls Review
•
There are lots of ways that domino circuits can fail:
– Charge sharing and leakage
– Noise coupling onto the output (crosstalk).
– An -particle hit, sub-threshold leakage, or substrate charge
injection on the dynamic node.
– Power supply noise (especially ground bounce).
•
Fortunately, these are all relatively easy to check with ERC
(Electrical Rule Check) and DRC (Design Rule Check) tools.
– Microprocessor companies routinely build reliable domino
datapaths these days.
EE 215B
39
Domino Logic Family Outline
•
Dynamic/domino logic
– Domino logic
– Timing of domino logic
– Noise issues and keepers
•
Dual-rail domino logic (Dynamic CVSL) and other domino styles
EE 215B
40
Non-monotonic Logic
Domino gate + high skew gate pair can only implement non-inverting
(“monotonic”) functions.
•
Many important functions are non-monotonic, such as XOR
clk
a
a_b
b_b
b
One solution: push non-monotonic function to end of logic cone
– Build first part of cone in domino gates
– Switch to static of transmission gate logic for non-monotonic part
– Example: carry select adder often uses static mux
EE 215B
41
Dual-Rail Domino
clk
clk
out_L
out_H
F
a
a
F
b
a
b
merge into a single pulldown network
•
•
•
We can overcome this problem by computing both true and complementary
outputs with dual rail domino.
Also known as “Differential Cascode Voltage Switch” (DCVS)
Compute out_H and out_L; may be able to share transistors
– out_H is asserted when the output is evaluated to be high
– out_L is asserted when the output is evaluated to be low
– Asserting both out_H and out_L is illegal
Both out_H and out_L are unasserted during pre-charge
EE 215B
42
Keepers for DCVS
m1

F
m2

Pull-down
Tree
m1

F

•
•
F
m2

Pull-down
Tree
F

Keepers are the same idea.
Since we have differential, keepers can be cross coupled.
EE 215B
43
Multiple-Output Domino
•
MODL (Hwang89)
– Opportunistic reuse of logic
– Particularly true of pre-charged carry-propagate chain
• Can be thought of as one big gate.
EE 215B
44
Diode-Footed Domino
VDD
CLK
Out
CL
Diode-Foot
Current
Mirror
CLK_b
• The stacking reduces leakage
• Current mirror and feedback increase the speed
45
Operation: Pre-Charge Phase
VDD
CLK = 0
0 -> VDD
VDD -> 0
CL
VDD -> 0
CLK_b= VDD
46
Operation: Evaluate Phase
VDD
CLK =
VDD
0
0
VDD
CLK =
VDD
VDD -> VDD
0
CL
0 -> 0
0
1
Vx has finite voltage due to leakage
current.
Stack of 2 – reduce leakage.
1
CL
0 -> VDD
Vx
Vx  0V
CLK_b = 0
VDD -> 0
CLK_b = 0
Initial discharge due to charge sharing
Current mirror provide a faster
discharge path.
Feedback provide remaining discharge
47
Simulations
Noise immunity test:
Apply input noise pulse until
noise is unity gain.
Normal Operation
48
Noise Immunity of DFD
49
Summary
•
•
•
•
•
Dynamic logic is based on optimizing for one edge of evaluation.
– To eliminate the other edge, a pre-charge phase is
introduced.
Timing is a critical element of the design
Because one of the nodes is dynamic, noise is another critical
design constraint.
– Large internal capacitance can lead to a bad delayrobustness tradeoff.
Large fanin can be challenging (especially ANDs).
– Monotonicity forces us to build dual rail making ANDs
unavoidable.
Diode-footed is one attempt at pushing the tradeoff to a different
point. (We’ll see many more).
EE 215B
50