Download ppt

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Clockless Logic
Montek Singh
Thu, Mar 2, 2006
1
Review:
Logic Gate Families
 Static CMOS logic
 Dynamic logic, or “domino” logic
 Transmission gates, or “pass-transistor” logic
2
Static CMOS logic
Advantages:
 output always strongly driven
 pull-up and pull-down networks are fully-complementary;
exactly one of them is “on” always
 good immunity from noise and leakage
 both inverting and non-inverting functions implementable
 each gate is inverting
 cascade two gates together to get non-inverting logic
Disadvantages:
 slow/big PMOS devices needed (in addition to NMOS)
 greater chip area
 higher power consumption
 slower switching speed
3
Dynamic Logic, or “domino”
Key idea:
 only use NMOS’s to compute function
 use a single PMOS to reset
Advantages:
 significantly fewer transistors  smaller chip area
 higher speed, lower power
 less “loading” on wires (drive fewer transistors)
 for async: no storage elements needed
Disadvantages:
 need extra control input to precharge
 logic is typically non-inverting only
 more vulnerable to noise and leakage effects
4
Dynamic Logic, or “domino” (contd.)
Gate has 2 phases:
 precharge (=reset): output reset to ‘0’
 evaluate: output computed  either stays ‘0’, or switches to ‘1’
control input
PC
data
inputs
pull-up
network
pull-down
network
controls
“precharge”
data
output
controls
“evaluation”
PC =0 (asserted)
 precharge
PC =1 (de-asserted)
 evaluate
Pull-up and pull-down must never both be simultaneously active:
 ensure that data inputs are reset while gate is precharging
 or, add a “footer” device
5
Transmission Gates
Key Idea:
 transistors used in a different configuration
 when switched on: instead of connecting output to Vdd or
Gnd, they connect output to the input
Advantage:
 very efficient for implementing switches and multiplexers
Disadvantage:
 signal degradation unless both NFET and PFET passgates are
used in a complementary configuration
6
Outline: Several Pipeline Styles
 Classic static logic pipeline: Sutherland
 Recent static logic pipeline: MOUSETRAP
 Classic dynamic logic pipeline: Williams/Horowitz’
PS0
7
A Classic Asynchronous
Dynamic Pipeline
Williams and Horowitz’s PS0 pipeline:
 Structure
 Operation
 Performance
8
A Classic Approach: PS0 Pipeline
Williams/Horowitz (Stanford U.) [1986-91]:
 successfully used in fabricated chips [Stanford ’87] [HAL ’90s]
Stage 2
Stage 1
Stage 3
ack
Data
in
data
Processing
Block
Data
out
Completion
Detector
Implemented using “dynamic logic”
9
PS0 Pipeline Stage
A PS0 stage consists of dynamic gates and a
completion detector:
ack
PC
data
inputs
Completion
Detector
“keeper”
Pull-down
network
Processing Block
data
outputs
10
Dual-Rail Completion Detector
 Combines dual-rail signals
 Indicates when all bits are valid (or reset)
C-element:
if all inputs=1, output  1
if all inputs=0, output  0
bit0
OR
bit1
OR
bitn
OR
else, maintain output value
C
Done
 OR together 2 rails per bit
 Merge results using “C-element”
11
PS0 Protocol
 PRECHARGE N: when N+1 completes evaluation
 delete data: after next stage has copied it
 EVALUATE N:
when N+1 completes precharging
 accept new data: after next stage is emptied
indicates “done”
6
3
N+1 5
N
1
evaluates
indicates “done”
2
precharges
evaluates
4
N+2
3
evaluates
Complete
cycle:
6 events3
Evaluate
events3 events
Precharge
Precharge:
Evaluate:
another
12
PS0 Performance
6
4
5
2
1
Cycle Time =
3
3 TEVAL  TPRECH  2 TDETECT
TE VA L 
Evaluation Time
TP RE CH 
Precharge Time
TDE TECT  Completion Detection Time
13
Summary: PSO Pipelining
Datapaths are latch-free:
 dynamic gates themselves provide implicit latches
+: chip area savings
+: extremely low latency
Data items kept separate by control
 stage deletes data: only after next stage has copied it
 stage accepts new data: only if next stage is empty
 distinct data items always separated by “spacers”
Control is extremely simple: each controller = single wire
 completion detector directly controls previous stage
+: chip area savings
+: low control overhead
14
Comparison to a Clocked Pipeline
How would you design the pipeline if you actually had a clock?
1. Replace handshaking with “magic clocking”
each stage gets its own clock
 successive clocks are slightly skewed

 essentially, clocked simulation of asynchronous handshaking!
– need multiple clock phases!
Ck
Ck’
latch
2. Use a single clock, but insert latches between stages
latches are simple, level-sensitive
 consecutive stages receive complementary clock signals

15
Comparison … (contd.)
Cycle Times?
16
Drawbacks of PSO Pipelining
1.
Poor throughput:
long cycle time: 6 events per cycle
 data “tokens” are forced far apart in time

2.
Limited storage capacity:

max only 50% of stages can hold distinct tokens

data tokens must be separated by at least one spacer
Our Research Goals: address both issues

still maintain very low latency
17