Download PPT - ECSE - Rensselaer Polytechnic Institute

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Transistor wikipedia , lookup

Switched-mode power supply wikipedia , lookup

Resistive opto-isolator wikipedia , lookup

Alternating current wikipedia , lookup

Opto-isolator wikipedia , lookup

Buck converter wikipedia , lookup

Power MOSFET wikipedia , lookup

TRIAC wikipedia , lookup

Rectiverter wikipedia , lookup

Transcript
Critical ALU Path Optimization
and Implementation in a
BiCMOS Process for Gigahertz
Range Processors
Matthew W. Ernest
Electrical, Computer and Systems Engineering
Dept.
Rensselaer Polytechnic Institute
mwe/PHD/1
Overview
•
•
•
•
•
Motivation
Parallel Prefixes and Carry Types
HBT Digital Circuits
Pseudo-carry Adder
Future Directions
mwe/PHD/2
Motivation
“Speed has always been important otherwise
one wouldn't need the computer.” -Seymour
Cray
• Ubiquity
• Simplicity
• Complexity
mwe/PHD/3
Parallel Prefixes
Given: x0
Find: x0
x1
x2
x0 x1 x0  x1 x2
xk
... x0 x1 x2... xk
...
• The set of problems covering sequences of
operations where terms are added in order to the
result of the previous operation
• Carry computation is an application of parallel
prefix theory
mwe/PHD/4
Carry types: Carry Select
1
1
0
0
• Compute possible results in
parallel
• Select when actual carry-in
available
• Requires internal carry for
blocks, e.g. ripple
• Delay: O(f(n/b) +b), min.
O(n1/2)
• Area: O(f(n/b)b+b), approx.
2n
• Affected by block sizing
mwe/PHD/5
Carry Types: Carry look-ahead
• Carry-out can be
“generated” at current
position or carry-in
“propagated”
• Delay: O(1)
• Area: O(n2)
• High fan-in/fan-out
mwe/PHD/6
Carry Types: Block carry lookahead
• A block propagates a carry
if all bits in the block
propagate a carry
• A block generates a carry
if a bit generates a carry
and all succeeding bits
propagate
• Delay: O(log n)
• Area: O(n log n)
mwe/PHD/7
Block carry look-ahead trees
mwe/PHD/8
Carry vs. Pseudo-carry
Cout=Gn+ Pn• Gn-1 +…+Pn• Pn-1• ... P0• Cin
If
and
then
G=A•B
P=A+B
G=G•P
Cout= Pn•Gn+ Pn• Gn-1 +…+Pn• Pn-1• ... P0•
Cin
Cout= Pn(Gn+ Gn-1 +…+Pn-1• ... P0• Cin)
Cout= Pn•Hn
Hn =Gn+ Gn-1 +…+Pn-1• ... P0• Cin
mwe/PHD/9
Carry vs. Pseudo-carry
• Redundant terms create factorization
opportunities
• Factorization moves terms from critical
paths to non-critical paths
• Multiple paths can be parallelized
• Products with fewer terms lead to
implementations with smaller, faster gates
mwe/PHD/10
Deriving Block Pseudo-carry from
Block Carry Look-ahead Terms
Block Generate:
Gi•j0= Gij + PijGij-1i + … + PijPij-1iPij-2i•••Gi0
If
and
then
G=A•B
P=A+B
G=G•P
Gi•j0= PijGij + PijGij-1i + … + PijPij-1iPij-2i•••Gi0
Gi•j0= Pij(Gij + Gij-1i + … + Pij-1iPij-2i•••Gi0)
Hi•j0= Gij + Gij-1i + … + Pij-1iPij-2i•••Gi0
• Pseudo-carries can be generated in blocks like carries
mwe/PHD/11
Generalized Pseudocarry
Equations
H2s= G1s+1 + G1s
Hi+js= Hjs+i + Ijs+i-1•His
Hi+j+ks= Hks+I+j + Iks+I+j-1•Hjs+i + Iks+I+j-1• Ijs+i-1•His
Ip+qt= Iqt+p•Ipt
Ip+q+rt= Irt+q+p•Iqt+p•Ipt
mwe/PHD/12
Generating Sums Using
Pseudocarry
Sn=AnBnCn-1
If
then
Tn=AnBn
Cm= Pm•Hm
Sn=TnPn-1Hn-1
• Sum with pseudo-carry
no more complex than
sum with carry
• Other look-ahead
features still apply, e.g.
Han-Carlson “every
other carry”
mwe/PHD/13
CSel
B C
A
32
32
12 12 9
6 5
64
64
20 16 12
7 6
CLA
Bits
Ripple
PCLA
Adder comparision
mwe/PHD/14
HBT Digital Circuits
• Exponential I/V relationship leads to high
gain and fast switching
• Vertical arrangement allows critical
dimensions to be smaller with tighter
tolerances
• Traditionally high DC power consumption:
compare increasing leakage and switching
currents for FETs
mwe/PHD/15
Current Steering Logic
• Constant current source equals
combined emitter currents
• Ratio of current through each
transistor is exp. function of base
voltage
• Difference in currents at collector
converted to difference in voltage
on pull-up resistors.
mwe/PHD/16
Single-ended vs. Double-ended
• Limited to simple
functions
• Large fan-in
• Any function of inputs
• Fan-in limited by supply
voltage
mwe/PHD/17
Look-ahead gate w/ fully
differential logic
Hn-1
Hn-2
Hn-1
In
Hn-2
In-1
In
In-1
Hn-1
Hn
Hn-1
Hn
In
Hn
In
Hn
mwe/PHD/18
Mixed input look-ahead gates
Hn
Hn-1
In
Vr
Hn
Vr
In
• In(Hn+ Hn-1) + In•Hn
• Hn+ In•Hn-1
• Two series-gated
levels for three
inputs
mwe/PHD/19
Mixed input look-ahead gates
Hn
Hn-1
Hn
Hn-2
In-1
In
In-1
Hn-1
Hn
• In In-1(Hn+ Hn-1 + Hn-2) +
In In-1(Hn+ Hn-1) + In• In-1•
Hn
• Hn+ In•Hn-1 + In• In-1• Hn-2
• Three series-gated
In
levels for five inputs
mwe/PHD/20
Pseudocarry Blocks
H2s
H2s
H6s
H2s
H2s
H2s
H6s
H2s
H2s
H2s
H2s
H6s
H18s
H2s
H2s
H2s
H2s
H6s
H2s
H2s
H6s
H14s
H32s
mwe/PHD/21
H2s
Pseudocarry Tree Oscillator
0
Select
1
31
32
1
B
A
Cin
Cout
mwe/PHD/22
Carry Tree High-speed Output
2 x 165 ps
mwe/PHD/23
Breakdown
of
measured
delay
Resistor model
Temperature
11%
Total measured delay
= 165 ps
6%
Wire C
12%
71%
Devices
mwe/PHD/24
Loaded vs. unloaded toggling
• At design time, fT peak at 1.2mA/um2 but
limit at 2mA/um2
• For some devices, max. frequency when
driving load can occur above fT peak current
• Models supported this, no reason at time to
not believe them
• However, models are never qualified above
fT peak current!
mwe/PHD/25
Loaded vs. unloaded toggling
8.00E-11
7.00E-11
Buffer Delay
6.00E-11
5.00E-11
4.00E-11
3.00E-11
2.00E-11
1.00E-11
0.00E+00
0.00E+00
5.00E-04
1.00E-03
1.50E-03
Tail Current
2.00E-03
2.50E-03
mwe/PHD/26
Resistor Model Effects
9805A
99B
Simulated Fabricated
Pull-up
444
528
Tail
1000
1091
mwe/PHD/27
Model parameter variation
500
450
400
350
DARPA02 Design
DARPA02 Fabrication
Paramter value
300
RB (ohms)
RE (ohms)
RC (ohms)
250
200
150
100
50
0
9708A
9802
9805
1999B
v2.3
Design Kit
mwe/PHD/28
Cadence internal parasitic
methods
• Approximates all capacitance as polynomial
function of distance between conductors
• Cannot extract RC and capacitance between
conductors at the same time: killer for
differential wiring!
• Convenient, but window of usability small
and shrinking
mwe/PHD/29
QuickCap capacitance extraction
• Field solving with floating random walk
method
• Accuracy almost wholly a function of run
time: 4x run time give ½ error
• Random walks independent, near perfect
parallelization
mwe/PHD/30
Comparing parasitic extraction
50
45
40
35
Delay (ps)
30
Qcap RC
RCNET
25
PCAP
Calc RC
20
15
10
5
0
0
200
400
600
800
1000
1200
Length (um )
mwe/PHD/31
Cadence/QuickCap Design Flow
• Extract physical data
from layout
• Compute RC with
QuickCap
• Extract netlist from
schematic
• Combine to simulate
with Spectre
mwe/PHD/32
Partial manual extraction with
QuickCap
• Identify main wires of oscillation paths:
approx. dozen pairs
• QuickCap extraction for each wire-ground
cap. and cap. between pair
• Add RC-ladder for each pair by hand to
schematic and simulate
mwe/PHD/33
Simulation with Parasitic
Extraction
Feedback
path
w/o
parasitics
(ps)
QuickCap
parasitic cap.
(ps)
COEFGEN
parasitic cap.
(ps)
Raphael
parasitic
cap.
(ps)
QuickCap
parasitic
RC
(ps)
Cin
100
121
128
131
135
A1
103
123
130
129
137
A31
108
127
129
132
141
mwe/PHD/34
Pseudo-carry Tree configured as
Ring Oscillator
00...00
11...11
30
32
1
Sel0
Sel1
1
B
A
Cin
1
Cout
mwe/PHD/35
SMI00 Test Structure Layout
mwe/PHD/36
SMI00 Test Structure
mwe/PHD/37
Carry Tree High-speed Outputs
16 x 146 ps
mwe/PHD/38
Comparisons of published adders
Reference
ZIMM96
STEL96
WANG97
CHAN98
SILB98
AIPP99
SAGE01
MATH01
STAS01
LEE02
VANA02
Type
Carry
Adder
Adder
Adder
Fixed
Adder
Adder
Adder
Adder
Adder
ALU
Size
32
64(32)
32
64(32)
64
64
32[16x2]
64
64
64
32
Gate Del.
5
12.5(12?)
3
27(19.5)
8
Time
2.7ns
550 ps
660 ps
<500ps
482 ps
440 ps
900 ps
<200 ps
mwe/PHD/39
Cascode Output Stage
• Eliminates Miller
capacitance between
input and output
• Reduces Cjc and Cjs
on outputs
• Shortens rise time,
but increases delay
mwe/PHD/40
Dotted Emitter/Collector
mwe/PHD/41
“Wide/Short” gate with dotted
emitter/collector
mwe/PHD/42
“Wide/Short” gate with dotted
emitter/collector
• Shorter trees lead to lower supply voltages
• Wider trees reduce ratio of emitter-followers to
terms computed, lowering total current
• More inputs per look-ahead gate means fewer
look-ahead levels
• Elimination of single-ended inputs on critical H
signals allow faster switching with reduced
swing
mwe/PHD/43
Even wider look-ahead gate
Width limited by
• Accumulated Cjc and Cjs of dotted-and node
• Saturation vs. breakdown
• Fan-out loading from inputs and interconnect
mwe/PHD/44
Conclusions
• 32-bit addition depth reduced to 5 gates fabricated.
4 and 3 gate depth circuits designed.
• Gate to compute 3-way look-ahead fabricated. Up
to 8-way look-ahead designed.
• Carry delay for 32-bit addition measured at 146ps.
• QuickCap technology file for 5HP brings
simulated results within 11% of measured.
mwe/PHD/45