Download Logic Families and Williams` Pipeline

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Clockless Computing
Montek Singh
Thu, Sep 6, 2007
 Review: Logic Gate Families
 A classic asynchronous pipeline by Williams
1
Review:
Logic Gate Families
 Static CMOS logic (“standard”)
 Transmission gates, or “pass-transistor” logic
 Dynamic logic, or “domino” logic
2
Static CMOS logic: Summary
Advantages:
 output always strongly driven
 pull-up and pull-down networks are fully-complementary;
always exactly one of them is “on”
 good immunity from noise and leakage
 both inverting and non-inverting functions implementable
 each gate is inverting
 cascade two gates together to get non-inverting logic
Disadvantages:
 slow/big PMOS devices needed (in addition to NMOS)
 greater chip area
 higher power consumption
 slower switching speed
3
Complementary CMOS
 Complementary CMOS logic gates
– nMOS pull-down network
– pMOS pull-up network
inputs
– a.k.a. static CMOS
Pull-up OFF
Pull-up ON
Pull-down OFF Z (float)
1
Pull-down ON
X (crowbar)
OPTIONAL
MATERIAL
0
Credit: David Harris, Harvey Mudd College
pMOS
pull-up
network
output
nMOS
pull-down
network
4
Series and Parallel




nMOS: 1 = ON
pMOS: 0 = ON
Series: both must be ON
Parallel: either can be ON
a
a
0
g1
g2
(a)
b
OFF
OFF
ON
a
a
a
a
0
1
1
0
1
b
b
b
ON
OFF
OFF
OFF
a
a
a
a
0
0
1
1
0
1
1
b
b
b
b
OFF
ON
ON
ON
a
a
a
a
a
g2
1
b
0
b
g1
1
OFF
a
(c)
0
b
(b)
g2
1
b
0
g1
1
1
0
g2
a
b
a
g1
a
0
0
b
b
0
0
b
(d)
OPTIONAL
MATERIAL
a
Credit: David Harris, Harvey Mudd College
0
1
1
0
1
1
b
b
b
b
ON
ON
ON
OFF
5
CMOS Gate Design
 Activity:
– Sketch a 4-input CMOS NOR gate
A
B
C
D
Y
OPTIONAL
MATERIAL
Credit: David Harris, Harvey Mudd College
6
CMOS Gate Design
 Activity:
– Sketch a 4-input CMOS NAND gate
OPTIONAL
MATERIAL
Credit: David Harris, Harvey Mudd College
7
Conduction Complement
 Complementary CMOS gates always produce 0 or 1
 Ex: NAND gate
– Series nMOS: Y=0 when both inputs are 1
– Thus Y=1 when either input is 0
Y
– Requires parallel pMOS
A
B
 Rule of Conduction Complements
– Pull-up network is complement of pull-down
– Parallel -> series, series -> parallel
OPTIONAL
MATERIAL
Credit: David Harris, Harvey Mudd College
8
Compound Gates
 Compound gates can do any inverting function
 Ex: Y  A B  C D (AND-AND-OR-INVERT, AOI22)
A
C
A
C
B
D
B
D
(a)
A
(b)
B C
D
(c)
C
D
A
B
(d)
C
D
A
B
A
B
C
D
Y
A
C
B
D
Y
(f)
(e)
OPTIONAL
MATERIAL
Credit: David Harris, Harvey Mudd College
9
Transmission (“Pass”) Gates
Key Idea:
 transistors used in a different configuration
 when switched on: instead of connecting output to Vdd or
Gnd, they connect output to the input
Advantage:
 very efficient for implementing switches and multiplexers
Disadvantage:
 signal degradation unless both NFET and PFET passgates are
used in a complementary configuration
10
Pass Transistors
 Transistors can be used as switches
g
s
d
g
s
OPTIONAL
MATERIAL
d
Credit: David Harris, Harvey Mudd College
11
Pass Transistors
 Transistors can be used as switches
g=0
g
s
d
s
d
Input g = 1 Output
0
strong 0
g=1
s
d
g=0
g
s
s
g=1
Input
d
d
g=1
s
OPTIONAL
MATERIAL
1
d
degraded 1
g=0
0
Output
degraded 0
g=0
Credit: David Harris, Harvey Mudd College
strong 1
12
Transmission Gates
 Single pass transistors produce degraded outputs
– pMOS good only for transmitting “1”
– nMOS good only for transmitting “0”
OPTIONAL
MATERIAL
Credit: David Harris, Harvey Mudd College
13
Transmission Gates
 Single pass transistors produce degraded outputs
 Complementary Transmission gates pass both 0 and
1 well
Input
g
a
b
gb
a
b
gb
OPTIONAL
MATERIAL
g = 0, gb = 1
a
b
g = 1, gb = 0
0
strong 0
g = 1, gb = 0
a
b
g = 1, gb = 0
strong 1
1
g
g
a
g
b
gb
Output
a
b
gb
Credit: David Harris, Harvey Mudd College
14
Multiplexers
 2:1 multiplexer chooses between two inputs
S
S
D1
D0
Y
0
X
0
0
0
X
1
1
1
0
X
0
1
1
X
1
OPTIONAL
MATERIAL
D0
0
Y
D1
Credit: David Harris, Harvey Mudd College
1
15
Transmission Gate Mux
 Nonrestoring mux uses two transmission gates
– Only 4 transistors
S
D0
Y
S
D1
S
OPTIONAL
MATERIAL
Credit: David Harris, Harvey Mudd College
16
Gate-Level Mux Design
 Y  SD1  SD0 (too many transistors)
 How many transistors are needed? 20
D1
S
D0
D1
S
D0
OPTIONAL
MATERIAL
Y
4
2
4
2
4
2
Y
2
Credit: David Harris, Harvey Mudd College
17
Dynamic Logic, or “domino”
Key idea:
 only use NMOS’s to compute function
 use a single PMOS to reset
Advantages:
 significantly fewer transistors  smaller chip area
 higher speed, lower power
 less “loading” on wires (drive fewer transistors)
 for async: no storage elements needed
Disadvantages:
 need extra control input to precharge
 logic is typically non-inverting only
 more vulnerable to noise and leakage effects
18
Dynamic Logic, or “domino” (contd.)
Gate has 2 phases:
 precharge (=reset): output reset to ‘0’
 evaluate: output computed  either stays ‘0’, or switches to ‘1’
control input
PC
data
inputs
pull-up
network
pull-down
network
controls
“precharge”
data
output
controls
“evaluation”
PC =0 (asserted)
 precharge
PC =1 (de-asserted)
 evaluate
Pull-up and pull-down must never both be simultaneously active:
 ensure that data inputs are reset while gate is precharging
 or, add a “footer” device
19
Outline: Several Pipeline Styles
 Classic static logic pipeline: Sutherland
 Recent static logic pipeline: MOUSETRAP
 Classic dynamic logic pipeline: Williams/Horowitz’
PS0
20
A Classic Asynchronous
Dynamic Pipeline
Williams and Horowitz’s PS0 pipeline:
 Structure
 Operation
 Performance
21
A Classic Approach: PS0 Pipeline
Williams/Horowitz (Stanford U.) [1986-91]:
 successfully used in fabricated chips [Stanford ’87] [HAL ’90s]
Stage 2
Stage 1
Stage 3
ack
Data
in
data
Processing
Block
Data
out
Completion
Detector
Implemented using “dynamic logic”
22
PS0 Pipeline Stage
A PS0 stage consists of dynamic gates and a
completion detector:
ack
PC
data
inputs
Completion
Detector
“keeper”
Pull-down
network
Processing Block
data
outputs
23
Dual-Rail Completion Detector
 Combines dual-rail signals
 Indicates when all bits are valid (or reset)
C-element:
if all inputs=1, output  1
if all inputs=0, output  0
bit0
OR
bit1
OR
bitn
OR
else, maintain output value
C
Done
 OR together 2 rails per bit
 Merge results using “C-element”
24
PS0 Protocol
 PRECHARGE N: when N+1 completes evaluation
 delete data: after next stage has copied it
 EVALUATE N:
when N+1 completes precharging
 accept new data: after next stage is emptied
indicates “done”
6
3
N+1 5
N
1
evaluates
indicates “done”
2
precharges
evaluates
4
N+2
3
evaluates
Complete
cycle:
6 events3
Evaluate
events3 events
Precharge
Precharge:
Evaluate:
another
25
PS0 Performance
6
4
5
2
1
Cycle Time =
3
3 TEVAL  TPRECH  2 TDETECT
TE VA L 
Evaluation Time
TP RE CH 
Precharge Time
TDE TECT  Completion Detection Time
26
Summary: PS0 Pipelining
Datapaths are latch-free:
 dynamic gates themselves provide implicit latches
+: chip area savings
+: extremely low latency
Data items kept separate by control
 stage deletes data: only after next stage has copied it
 stage accepts new data: only if next stage is empty
 distinct data items always separated by “spacers”
Control is extremely simple: each controller = single wire
 completion detector directly controls previous stage
+: chip area savings
+: low control overhead
27
Comparison to a Clocked Pipeline
How would you design the pipeline if you actually had a clock?
1. Replace handshaking with “magic clocking”
each stage gets its own clock
 successive clocks are slightly skewed

 essentially, clocked simulation of asynchronous handshaking!
– need multiple clock phases!
Ck
Ck’
latch
2. Use a single clock, but insert latches between stages
latches are simple, level-sensitive
 consecutive stages receive complementary clock signals

28
Drawbacks of PS0 Pipelining
1.
Poor throughput:
long cycle time: 6 events per cycle
 data “tokens” are forced far apart in time

2.
Limited storage capacity:

max only 50% of stages can hold distinct tokens

data tokens must be separated by at least one spacer
My Research Goals have been: address both issues

still maintain very low latency
29
Homework #4 (due Tue Sep 18)
1. Enumerate ALL of the timing assumptions inherent
in Williams’ PS0 style
Assume all gate and wire delays can be arbitrary
 For which scenarios can there be a malfunction?

2. Compare the cycle times of PS0 with an ideal
clocked dynamic pipeline (slide #28)
30