Download talk

Document related concepts
no text concepts found
Transcript
Equalized On-chip Interconnect:
Modeling, Analysis, and Design
Byungsub Kim
Integrated Systems Group
Massachusetts Institute of Technology
Advisor: Vladimir Stojanović
1
Interconnects dominate performance and power
IBM blue gene/L
racks
2
Interconnects dominate performance and power
IBM blue gene/L
racks
Cables
3
Interconnects dominate performance and power
IBM blue gene/L
Cables
and more hidden
4
Interconnects dominate performance and power
IBM blue gene/L
racks
Cables
node cards
5
Interconnects dominate performance and power
IBM blue gene/L
racks
Cables
Backplane
interconnect
node cards
6
Interconnects dominate performance and power
IBM blue gene/L
racks
Cables
node cards
compute cards
7
Interconnects dominate performance and power
IBM blue gene/L
racks
PCB wiring pattern
Cables
node cards
compute cards
8
Interconnects dominate performance and power
IBM blue gene/L
racks
power 440 soc
Cables
node cards
compute cards
9
Interconnects dominate performance and power
IBM blue gene/L
racks
on-chip wires
power 440 soc
Cables
node cards
compute cards
10
What are these interconnects?:
RC-dominant wires
Many core processor networks
(Tilera 64core)
Packages: silicon carrier
(Patal DAC06)
PCB wires


Short distance (1-2cm, 2-4cm,
<10cm), high wire density, large
channel loss
Power and metal area budget
11
Increasing demand for global interconnects


Mesh – cheap local interconnect but low performance
Higher radix networks – expensive global interconnect but high
performance
12
Interconnects in NoC design hierarchy
Power
Kim ICCAD07
Terascale
NoC topologies
Rep.
.
Eq.
Offered BW
NoC trade-offs
2
Interconnect
1
0
1
Throughput Density
2
3
( Gbps / um )
Circuit + wire
parameters [ 9 ]
Link parameter

2. 5
Eq . , Width
Eq . , Space
Rep . , Width
Rep . , Space
Energy/Bit (pJ/Bit)
Wire Width and Space (um)
N oC
3
Offered BW
Vp
D
Vp
D
Vp
Vs
+
-
Wth
+
-
w2
Vth
-
y1
Sp
1
D
clk
+
Vs
d̂ i
+
w0
Vs
w1
1. 5
1
clk
WLCM
+
-
2
0. 5
Td
di
Equalized , 30 mV Eye
Equalized , 50 mV Eye
Equalized , 90 mV Eye
Repeated
0
Vth
-
-y1
0
0
1
Data Rate Density
2
( Gbps / um )
3
Trade - off curve of
interconect [ 9 ]
Link trade-off
wire, circuit parameter  Interconnect metrics  NoC
13
trade-offs
Thesis contributions

Fast CAD tool for link design space exploration (Kim
ICCAD07) – joint wire + circuit optimization

Charge Injection FFE + Pre-distortion (Kim ISSCC09)

Trans-impedance receive amplifier

Infrastructure for in-situ signal and energy
characterization of the on-chip links (KimJSSC10)
14
Modeling and analysis
many core processor
Power, latency,
Rep..
Offered bandwidth
Eq.
Power
NoC topologies,
VC buffer size,
Channel width…
N oC
2.5
Energy/B it (pJ/Bit)
Wire Width andSpace (um)
Offered BW
Data rate density (Gb/s/um)
Energy/bit (pJ/b)
Latency (ps)
Interconnect
topologies,
Wire
width & pitch,
3
2Transistor size
Interconnect
Eq., Width
Eq., Space
Rep., Width
Rep., Space
di
1
Offered BW
Td
Vp
0
1
2
3
Throughput Density
(Gbps/um)
D
Circuit+ wire
D
Vp
Vp
Vs
+
-
w0
+
-
w1
+
-
w2
Wth
Vs
- y
1
Sp
1.5
1
1
D
interconect
0
0
1
2
Data Rate Density (Gbps/um)
3
clk
+
Vs
d̂ i
+
Vth
2
0.5
clk
WLCM
Equalized , 30 mV Eye
Equalized , 50 mV Eye
Equalized , 90 mV Eye
Repeated
0
Vth
-
-y1
15
Review: Equalization
No equalization
Feed forward
equalization (FFE)
w0
D
w1
D
w2
+
y1
FFE + decision feedback
equalization (DFE)
w0
D
w1
D
w2
+
D
+
-y1
16
Equalization versus repeater
w0
voltage
swing
D
w1
D
w2
+
+
-y1
Vdd
channel attenuation
distance

…
D
distance
Equalization
 Faster data rate
 Lower latency
 Even lower power consumption by voltage swing reduction
17
Joint wire + circuit optimization
Vs
Voltages
Vp
FFE Coefficients
dnFFE
WnFFE
... ...

Link = Driver (impedance) + Wire + Receiver (impedance)
Optimize for power and performance
dk
d1
Wk
W1
...
Wire Sizes
...
Driver Size
d1
dk
dnFFE
Sample Delay
Td
ws
... ...

W1
...
GND
+
clk
Vth
-
...
Wk
-y1
WnFFE
GND
receivebits
D
GND
+
clk
Vth
-
y1
DFE Coefficient
18
Channel model – channel transfer function
Vs
Voltages
Vp
FFE Coefficients
dn FFE
... ...
W nFFE
dk
d1
Wk
W1
...
...
Driver Size
Td
ws
... ...
d1
dk
dn FFE
W1
...
GND
+
clk
Vth
-
...
Wk
-y1
W nFFE
GND

Sample Delay
Wire Sizes
receivebits
D
GND
RLGC
+
clk
Vth
-
y1
DFE Coefficient
Set telegrapher’s equation with Thevenin model
19
Channel model – comparison to SPICE simulation
Closed form solutions for T(f) and Tc(f)
 l j RoCo
Exponential form: small
2e
T   
tap count FFE&DFE
 1
1 
 Zc    Z L     R  Z () 
Impact of the impedance
c
 s



Good match with SPICE simulation
20
CAD flow
Technology information
Transistor: spice model
Wire: metal conductance,
dielectric constant, etc.
21
CAD flow
R, C model
for LCM & Inverter
Circuit Model
Normalized
R(Ohm-um), C(fF/um)
switch model database
Linearized
RC swtich
extraction
Circuit type:
LCM|Inverter,
WLCM, Vs, Vp
Technology information
Transistor: spice model
Wire: metal conductance,
dielectric constant, etc.
Circuit type:
LCM | Inverter
WLCM, Vs, Vp
22
CAD flow
 ro
R
 rc
 go
G
 gc
rc 
lo lc 
, L
,


ro 
lc lo 
gc 
co cc 
, C


go 
c
c
 c o
RLGC
parameters
Wire Model
2D RLGC matrices
database
Wth, Sp
y
2D field
solver
GND
R, C model
for LCM & Inverter
Circuit Model
Normalized
R(Ohm-um), C(fF/um)
switch model database
Linearized
RC swtich
extraction
Circuit type:
LCM|Inverter,
WLCM, Vs, Vp
Technology information
Wth, Sp
w S
id pac
GND
z
x
Transistor: spice model
Wire: metal conductance,
dielectric constant, etc.
Circuit type:
LCM | Inverter
WLCM, Vs, Vp
23
CAD flow
Transfer function:
T(f), Tc(f)
Channel model
RLGC
parameters
Wire Model
2D RLGC matrices
database
Wth, Sp
2D field
solver
target
R, C model
wire
length: l
for LCM & Inverter
Circuit Model
Normalized
R(Ohm-um), C(fF/um)
switch model database
Linearized
RC swtich
extraction
Circuit type:
LCM|Inverter,
WLCM, Vs, Vp
Technology information
Wth, Sp
Transistor: spice model
Wire: metal conductance,
dielectric constant, etc.
Circuit type:
LCM | Inverter
WLCM, Vs, Vp
24
CAD flow
Data rate density,
latency, eye opening,
sampling delay(Td)
Equalization
coefficient: w, y1
Link architecture:
FFE, DFE tap numbers
Link
performance
model
Transfer function:
T(f), Tc(f)
Channel model
RLGC
parameters
Wire Model
2D RLGC matrices
database
Wth, Sp
2D field
solver
target
R, C model
wire
length: l
for LCM & Inverter
Circuit Model
Normalized
R(Ohm-um), C(fF/um)
switch model database
Linearized
RC swtich
extraction
Circuit type:
LCM|Inverter,
WLCM, Vs, Vp
Technology information
Wth, Sp
Transistor: spice model
Wire: metal conductance,
dielectric constant, etc.
Circuit type:
LCM | Inverter
WLCM, Vs, Vp
25
CAD flow
Data rate density,
latency, eye opening,
sampling delay(Td)
Energy-per-bit
(Eb)
Equalization
coefficient: w, y1
Link
power model
Link architecture:
FFE, DFE tap numbers
Link
performance
model
Transfer function:
T(f), Tc(f)
target
data rate
density
Channel model
RLGC
parameters
Wire Model
2D RLGC matrices
database
Wth, Sp
2D field
solver
target
R, C model
wire
length: l
for LCM & Inverter
Circuit Model
Normalized
R(Ohm-um), C(fF/um)
switch model database
Linearized
RC swtich
extraction
Circuit type:
LCM|Inverter,
WLCM, Vs, Vp
Technology information
Wth, Sp
Transistor: spice model
Wire: metal conductance,
dielectric constant, etc.
Circuit type:
LCM | Inverter
WLCM, Vs, Vp
26
Connection to network architecture optimizer
energy-per-bit, data rate density, latency
CAD flow
Target eye constraint:
w_c_eye ≥ eye_target
Design selection
based on metrics
Data rate density,
latency, eye opening,
sampling delay(Td)
Equalization
coefficient: w, y1
Link
power model
2.5
Link architecture:
30mV Eye
FFE,Equalized,
DFE tap numbers
Equalized, 50mV Eye
Link
2
Equalized, 90mV Eye
performance
Repeated
model
Transfer function:
1.5
T(f), Tc(f)
target
data rate
density
Energy/Bit (pJ/Bit)
Energy-per-bit
(Eb)
Channel
1 model
RLGC
parameters
Wire Model
2D RLGC matrices
database
Wth, Sp
2D field
solver
target
R, C model
0.5
wire
length: l
for LCM & Inverter
Circuit Model
0
Normalized
0
1
2
R(Ohm-um), C(fF/um) Data Rate Density (Gbps/um)
switch model database
Linearized
RC swtich
extraction
3
Circuit type:
LCM|Inverter,
WLCM, Vs, Vp
Technology information
Wth, Sp
Transistor: spice model
Wire: metal conductance,
dielectric constant, etc.
Circuit type:
LCM | Inverter
WLCM, Vs, Vp
27
Tool verification
~50mV
28


Tool finds optimal
parameters
Verified against autogenerated spice-netlist
Energy/bit (pJ/bit)
0.35
Solid Line: This tool
0.3 Dashed Line: Spice
Ebtot
0.25
EbactiveDrv
EbPre
0.2
Ebw
0.15
EbscDrv
0.1
0.05
0
0
1
2
3
Data Rate Density (Gb/s/um)
28
Run time comparison : x30,000
LMSE
optimal Td
Brute Force
Equation
SPICE
Run time (h)
0.74
180
21,460
Normalized
x1
x244
x 29,00
Design space trade-off curves are computed over
423,000 designs within 44.4 min
circuit design

Run time is improved by ~x30,000
Network design
2.5
Energy/Bit (pJ/Bit)
2
wire, circuit,
equalization
parameters
3.5
design
exploration
energy, latency,
data rate density
Equalized, 30mV Eye
Equalized, 50mV Eye
Equalized, 90mV Eye
Repeated
transmit bits
Wdrv ,Vs ,V p
+
-
1.5
D
0.5
+
1
2
Data Rate Density (Gbps/um)
3
2
1.5
1
0
0
+
-
1
2.5
+
0.5
D
0
0
Eq., Width
Eq., Space
Rep., Width
Rep., Space
3
Wire Width and Space (um)

Design space
423K points
Vth
receive bits
-
1
2
-h1 (Gbps/um)
Throughput Density
+
width, spacing
wT   w1 w2 ... wnFFE 
Vth
-
h1
+h1
29
3
D
Link design space exploration: 90nm
2.5
Energy/Bit (pJ/Bit)
2
KimICCAD07
Equalized, 30mV Eye
Equalized, 50mV Eye
Equalized, 90mV Eye
Repeated
1.5
1
0.5
0
0


1
2
Data Rate Density (Gbps/um)
3
Energy vs. data rate density trade-off
2-10x energy improvement
30
Design space exploration: 32nm
Kim D&T08
4
2.5
200
110
3.5
120
Label unit : fJ/bit
2
3
Label unit : fJ/bit
Rpt.
1.5
Latency (ns)
Latency (ns)
220
130
140
150
160
1
170
Eq.
0
9
260
1.5
0.5
22
2
4
Data rate density (Gbps/um)
(a)
5mm

240
2
Rpt.
1
0.5
0
2.5
6
300
320
340
23
Eq.
0
0
280
130
1
2
3
4
5
Data rate density (Gbps/um)
(b)
10mm
2x less latency improvement with 10x less power
31
6
JoshiHotI09
Terascale
NoC topologies,
VC buffer size,
Channel width…
Power
Connection to network simulation
Rep.
.
Eq.
Power,
latency, Offered BW
Offered BW
Offered bandwidth
Interconnect
3
topologies,
2
Wire width & pitch,
1
Transistor size
0
1
Throughput Density
2
3
( Gbps / um )
Circuit + wire
parameters [ 9 ]

2. 5
Eq . , Width
Eq . , Space
Rep . , Width
Rep . , Space
Interconnect
Vp
D
Vp
D
Vp
Vs
+
-
Wth
+
-
w2
Vth
-
y1
Sp
1
D
clk
+
Vs
d̂ i
+
w0
Vs
w1
1. 5
1
clk
WLCM
+
-
2
0. 5
Td
di
Equalized , 30 mV Eye
Equalized , 50 mV Eye
Equalized , 90 mV Eye
Repeated
Data rate density (Gb/s/um)
Energy/bit (pJ/b)
Latency (ps)
Energy/B it (pJ/Bi t)
Wire Width and Space (um)
N oC
0
Vth
-
-y1
0
0
1
Data Rate Density
2
( Gbps / um )
3
Trade - off curve of
interconect [ 9 ]
Provides proper metrics for the NoC simulators
32
Connection to network simulation
Clos with repeated interconnect
25
Power (W)
20
15
10
5
0

Clos with equalized interconnect
0
2
4
6
8
Offered BW (kb/cyc)
0
2
4
6
8
Offered BW (kb/cyc)
Power versus offered bandwidth trade-offs of Clos
NoCs topologies with repeated and equalized
interconnects.
33
Circuit techniques for improved equalized interconnects
0.6um wide,
0.4um spaced
CLK T CLK T
IT+
ITIb
CLK R
CLKR
Vth
VT+
IR-
VTIb
1cm
VS+ 2.7k
VS-
VDD
DTx
Tx
Slicer &
DFE
VDD
‘...00100...'
DRx
2.7k
Static Pre-distorted
coefficients
Pre-distort ion 3-tap FFE Tx


TIA 1-tap DFE Rx
3-tap Nonlinear Charge-Injecting Feed Forward Equalizer
(FFE) improves driver power efficiency
Trans-impedance Amplifier (TIA) before 1-tap DFE
improves the bandwidth-power-amplitude trade-off
34
Review: 2-tap FFE driver
YFFE
di
D
d
1
YFFE

FFE
w0
w0-w1
-w1
+
d0
w0+w1
YFFE
t
w0d0 w1d
Data dependent
summation/subtraction
-w0+w1
1
-w0-w1
Di
di
0
-1
0
-1
1
1
0 0
0
-1 -135 -1
Conventional: voltage dividing (VD) driver
Vdd
w0 D
0
D0
YFFE
R/(0.5-αv)
R/(0.5+αv)
D0
w1
D -1
D0 D -1
R/(0.5+αv)
w0+w1
D -1
Zc(ω)
Zc(ω)
D -1
R/(0.5-αv)
w0-w1
Gnd
t
Vdd
R/(0.5+αv)
D0
-w0
D0
R/(0.5-αv)
D0
D -1
w1
D0 D -1
R/(0.5+αv)
D -1
-w0+w1
-w0+w1
Zc(ω)
Zc(ω)
D -1
R/(0.5-αv)
-w0-w1
Di 0 0 1 0 0 0
di -1 -1 1 -1 -1 -1
Gnd

Current wasted in subtraction
36
Conventional: voltage dividing (VD) driver
0.35
Ebtot
dd
EbactiveDrv
0.3
Ebpre
R/(0.5+αEbw
v)
0.25
EbscDrv
Energy/bit (pJ/bit)
V
0.2
w0 D
0
D0
w1
D -1
YFFE
R/(0.5-αv)
w0+w1
D -1
Zc(ω)
0.15
0.1
D0
D0 D -1
Zc(ω)
D -1
0.05
R/(0.5+αv)
0
0.5
w0-w1
R/(0.5-αv)
1
1.5
2
2.5
Data Rate Density (Gb/s/um)
3
Gnd
t
Vdd
R/(0.5+αv)
D0
R/(0.5-αv)
D0
wD0
0
D -1
w1
D0 D -1
R/(0.5+αv)
D -1
-w0+w1
-w0+w1
Zc(ω)
Zc(ω)
D -1
R/(0.5-αv)
-w0-w1
Di 0 0 1 0 0 0
di -1 -1 1 -1 -1 -1
Gnd

Current wasted in subtraction
37
Conventional: current mode logic (CML) driver
Vdd
w0+w1
R
YFFE
R
Zc(ω)
D0
D0 D -1
(0.5+αv)Vdd/R
w1
-w0
D0
Zc(ω)
D -1
(0.5-αv)Vdd/R
Gnd
Vdd
R
R
D0 D -1
(0.5+αv)Vdd/R
-w0+w1
Zc(ω)
Zc(ω)
D -1
(0.5-αv)Vdd/R
Gnd

w0+w1
Current wasted in subtraction
w0-w1
t
-w0+w1
-w0-w1
Di 0 0 1 0 0 0
d i -1 -1 1 -1 -1 -1
38
Conventional: current switching (CS) driver
Vdd
(0.5+αc)Icw
w1
w0 D
D0
D -1
D0
D0
D -1
0
YFFE
(0.5-αc)Icw
D -1
Zc(ω)
w0+w1
Zc(ω)
(0.5+αc)Icw
D -1
w0-w1
(0.5-αc)Icw
(0.5+αc)Icw
Gnd
Vdd
w1
D0
D0
D -1
-w0
(0.5+αc)Icw
t
D0
D0 D -1
(0.5-αc)Icw
D -1
-w0+w1
-w0+w1
Zc(ω)
Zc(ω)
D -1
(0.5-αc)Icw
-w0-w1
Di 0 0 1 0 0 0
d i -1 -1 1 -1 -1 -1
Gnd

Current wasted in subtraction
39
Proposed: Charge injection (CI) driver
Vdd
I1
I0
D0
YFFE
I1
I0
D0
D0 D-1
D0 D-1
I0+I1
Zc(ω)
Zc(ω)
D0
D0
D0 D-1
w0+w1
I1
D0 D-1
I1
I0
I0+I1
I0
w0-w1
Gnd
-I0
I1
I0
D0
D0
D0 D-1
D0 D-1
Zc(ω)
Zc(ω)
D0
D0
I0
D0 D-1
-I0
I1
t
-I0
Vdd
D0 D-1
w0-w1
-I1
-w0-w1
-I0+I1
Di 0 0 1 0 0 0
d i -1 -1 1 -1 -1 -1
Gnd

No current subtraction: no current waste
40
Average supply current (mA)
Power comparison
8x
3x
2x
4x
1x
Typical off-chip
2x, 2x
1x
On-chip
R of VD
41
Eye sensitivity
Vdd
w0
w1
D -1
D0
D0
D0
D0
D -1
D -1 D -1
Vdd/2
RL
Vdd/2
Zc(ω)
RL
Zc(ω)
w1 +∆w1
w0
Eye-∆Eye
Eye
E ye
Gnd

The percentage of the eye
reduction divided by the
percentage of the coefficient
perturbation
S xi 
E ye
xi
xi
xn  0, n  i
Eye sensitivity: CS FFE
YFFE
Vdd
w0 -∆w0
w0+w1
w1
Vdd/2
RL
Vdd/2
D -1
D0
D0
D -1
Zc(ω)
D0
D0
D -1
w0
w1
RL
w0-w1
Zc(ω)
D -1
-w0+w1
w1
w0
-∆w0
Eye
Gnd
-w0-w1
-∆w0
w0-w1
-∆w0
w0


Di
1 1 1 1
=
w1
A large error due to subtraction of large currents
Coefficient errors do not attenuate
43
Eye sensitivity: CI FFE (I0)
Vdd
I0
I0
D0
I0+I1
I1
D0
D0 D-1
D0 D-1
Vdd/2
RL
Vdd/2
Zc(ω)
D0
D0
I1
I0
I1
-∆I0
I0
RL
Zc(ω)
D0 D-1
YFFE
-I0
Eye
D0 D-1
Gnd
-∆I0
I0

=
I0 -∆I0
-I0-I1
Di
-I1
11 1 1
A small error by small I0 current
44
Eye sensitivity: CI FFE (I1)
Vdd
I1
I0
D0
D0
D0 D-1
I1
D0 D-1
Vdd/2
RL
Vdd/2
Zc(ω)
D0
D0
D0 D-1
I1
I0
I0+I1
RL
YFFE
I1
-∆I1xhatt
I0
Zc(ω)
D0 D-1
Eye
-I0
Gnd
-∆I1
(
-I0-I1
+
)
I0
-I1
Di 0 0 1 0 0
hatt
x (channel =
attenuation)
I1

Large error but attenuated by the channel
45
Resolution requirement comparison
CS,
VD,
CML
CI

Coefficients
Resolution Requirement (bits)
W0
W1
W2
I0
I1
I2
7-8b
8-9b
6-7b
3-4b
3-4b
4-5b
CI FFE


>10 x difference on eye sensitivity
10x less accurate hardware requirement
46
Link design overview
0.6um wide,
0.4um spaced
CLK T CLK T
IT+
ITIb
CLK R
CLKR
Vth
VT+
IR-
VTIb
1cm
VS+ 2.7k
VS-
VDD
DTx
Tx
Slicer &
DFE
VDD
‘...00100...'
DRx
2.7k
Static Pre-distorted
coefficients
Pre-distort ion 3-tap FFE Tx


TIA 1-tap DFE Rx
3-tap Nonlinear Charge-Injecting Feed Forward Equalizer
(FFE) improves driver power efficiency
Trans-impedance Amplifier (TIA) before 1-tap DFE
improves the bandwidth-power-amplitude trade-off
47
D0_Odd
D0_Even
3-tap CI FFE transmitter
Weak Driver
1
D1
L
LD
D1
P1Decoding
D2
Block
8
D0
D1
2
D2
D1
P1+
+
+
+
Strong Driver
20
+
P1 ,P2 ,N1 ,N2 ,


P1- ,P2- ,N1- ,N2-
I
Ib
-
8
8 Transition
Signals :
+
CLK CLK
V
CLK
8
MUX
L
LD
D0
D1
Ib
High
0.9
0.8
Middle
Middle
0.7
0.6
0.51.5
Current (mA)
MUX
+
-
D0
D0
Voltage (V)
CLK
2
Low
I2 2.5
I1
0.5
0
Time (ns)
3
3.5
1.5
2
2.5
-I0
Time (ns)
3
3.5
-I0
-0.5
A<19:0>
Static pre-distorted FFE
coefficients
-(I0+I1+I2)
Strong driver : Large current, I1, I2, I1+I2
Weak driver : Small current, I0
48
3-tap CI FFE transmitter


Strong driver : Large current, I1, I2, I1+I2
Weak driver : Small current, I0
49
3-tap CI FFE transmitter


An array of binary weighted transistor
A skewed enable NAND gate
50
3-tap CI FFE transmitter

The decoding
only require
small overhead
51
Nonliearity and predistortion
Tx data
Decoding Block with
Pre-distorted
FFE coefficients



Nonlinear CI
FFE driver
Inver-like strong drivers are power-area efficient but
non-linear
Static pre-distortion compensates non-linearity
Static pre-distortion is only possible in CI FFE
52
CI FFE operation

0001000
VDD
P1+
N1+
D0
D0
+
N2 +
20
I0
GND
VDD
A<19:0>
P1
N1
-
P2-
-
8
I0
20
A<19:0>
Middle
CLK
MUX
Latch &
Decoding Block
GND
Weak Driver
8
-
+
8
8 Transition
Signals :
P1+ , P2+ , N1+ , N2+ ,
P1- , P2- , N1- , N2-
Ib
Strong Driver
20
Ib
0.9
0.8
Middle
Middle
0.7
0.6
0.51.5
N2 -
+
-
High
P
Voltage (V)
+
Current (mA)
-
+
2
Low
2
I2 2.5
I1
0.5
0
3
1.5
2
2.5
-I0
3.5
Time(ns)
3
3.5
-I0
-0.5
-(I0 +I1 +I2 )
A<19:0>
Static pre-distorted FFE
coefficients
53
CI FFE operation

0001000
VDD
P1+
N1+
D0
D0
P2
+
N2 +
20
I0
GND
VDD
A<19:0>
P1
N1
-
P2-
-
GND
20
A<19:0>
8
I2
High
CLK
MUX
Latch &
Decoding Block
8
-
+
8
8 Transition
Signals :
P1+ , P2+ , N1+ , N2+ ,
P1- , P2- , N1- , N2-
Ib
Strong Driver
20
Ib
High
0.9
0.8
Middle
Middle
0.7
0.6
0.51.5
N2 -
+
-
Weak Driver
Voltage (V)
+
Current (mA)
-
+
Low
2
I2 2.5
I1
0.5
0
3
1.5
2
2.5
-I0
3.5
Time(ns)
3
3.5
-I0
-0.5
-(I0 +I1 +I2 )
A<19:0>
Static pre-distorted FFE
coefficients
54
CI FFE operation

0001000
VDD
P1+
N1+
D0
D0
+
N2 +
20
I0
GND
VDD
A<19:0>
P1
N1
-
P2-
-
8
I0
20
A<19:0>
Low
CLK
MUX
Latch &
Decoding Block
GND
Weak Driver
8
-
+
8
8 Transition
Signals :
P1+ , P2+ , N1+ , N2+ ,
P1- , P2- , N1- , N2-
I1 + I2
Ib
Strong Driver
20
Ib
0.9
0.8
Middle
Middle
0.7
0.6
0.51.5
N2 -
+
-
High
P
Voltage (V)
+
Current (mA)
-
+
2
Low
2
I2 2.5
I1
0.5
0
3
1.5
2
2.5
-I0
3.5
Time(ns)
3
3.5
-I0
-0.5
-(I0 +I1 +I2 )
A<19:0>
Static pre-distorted FFE
coefficients
55

25
20
Vtia
15
BW : 3dB bandwidth
V : received voltage at 2GHz
I : received current at 2GHz
Istatic : static curent for
resisitve termination
Vtia : TIA-converted voltage
with gain of 2.7k Ohm
Istatic
0
0.3
BW
I
0.2
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
Receiver’s input resistance R (kOhm)
TIA achieves high bandwidth, small static power, large
amplitude simultaneously

0.4
V
10
5
0.5
50% static current, 150% voltage amplitude for the same
bandwidth
56
3dB bandwidth BW (GHz)
V(mV), Vtia (mV), I(µA), Istatic (10µA)
Trans-impedance amplifier receiver
Chip fabrication: 90nm CMOS


Tx : 16x70um, Rx : 16x40um
10mm wire on M8 layer over M7 under M9
57
Chip fabrication: 90nm CMOS


Tx : 16x70um, Rx : 16x40um
10mm wire on M8 layer over M7 under M9
58
in-situ measurement support
59
Channel measurement

High-loss channel



25dB @ 690MHz, 40dB @ 2GHz, 46dB @ 3GHz
50% delay and 90% settling time of step response: 1.4ns and 5.5ns i.e.
8.6 and 33 UI at 6Gb/s
Losses in most other literatures on Eq. are up to 30dB
60
Eye measurement
(a) DFE 3-tap FFE 6Gb/s (b) No DFE, 3-tap FFE 4Gb/s (c) No DFE 2-tap FFE 2Gb/s



Vertical eye height is set about 100mV for trade-off curve
extraction
First on-chip measurement
Demonstrated good eye quality (>50%, max 80%)
61
Sensitivity measurement
(a) 6Gb/s


(b) 4Gb/s
(c) 2Gb/s
Sensitivity < 3 (10x better) : good match with the analysis
Low resolution and cheap (power & area) hardware (inverterlike) can equalize a high-loss channel (>40dB)
62
Power Performance Trade-offs
0.70
0.635pJ/b
Energy/bit (pJ/bit)
0.60
0.50
0.466pJ/b
0.454pJ/b
0.371pJ/b
0.40
Etc
Rx
TIA
TxOther
TxStr
0.30
0.20
0.10
0.00
2Gbps
1
1.1V
2tap FFE
No DFE


4Gbps
4Gbps
2
3
1V
1.1V
3tap FFE 3tap FFE
No DFE
No DFE
6Gbps
4
1.2V
3tap FFE
DFE
As data rate increases

DC bias (TIA) energy decreases

Channel compensation energy (TxStr) increases
Optimal power efficiency: balancing between static (TIA) and
63
active (channel, digital) energy
Comparison to relevant works
90nm technology
optimized repeater
2.50
0.70
Energy per bit (pJ/bit)
This work
2.00
0.60
[7]
[10]*
1.50
0.50
[8]
0.40
1.00
0.30
Eq./RF links
0.20
0.50
0.10
0.00
0.00
0.00
1.00
2.00
3.00
4.00
0.00
1.00
2.00
3.00
4.00
data rate density (Gb/s/um)


2x and 3x performance improvement for ~ additional
30% and 120% energy
The only measurement >2Gb/s/um-3Gb/s/um

[7] Mensink ISSCC2007, [10] Tam VLSI2009* (5mm), [8] Kim
ICCAD2007
64
Conclusion of thesis

Modeling and CAD tool





The first analytical link model for an RC-dominant link
Fast CAD tool improves x30,000 runtime
Equalized interconnects provides 2x-10x improved power efficiency than
repeaters
First demonstration of Eq. CAD + NoC CAD simulation
Circuit – CI FFE Tx and TIA Rx






CI FFE is introduced for the first time
CI FFE saves power by 2x than VD and CS FFE and 4x than CML
CI FFE allows simple static pre-distortion to utilize a cheap coarse
inverter-like drivers. Demonstrated for the first time
CI FFE is 10x more robust to coefficient error than CS FFE
TIA-termination saves 50% static current and achieves 150% amplitude
for the same bandwidth than a resistive termination
Area-energy efficient link operates over 40dB loss!
65
Publication & talk list

Conference







Journal




Byungsub Kim,Soumyajit Mandal, and Rahul Sarpeshkar, "Power-Adaptive Operational Amplifier with PositiveFeedback Self Biasing," IEEE ISCAS 2006.
Byungsub Kim and Vladimir Stojanovic, "Equalized Intebrconnects for On-Chip Networks: Modeling and
Optimization Framework," IEEE ICCAD 2007.
Byungsub Kim and Vladimir Stojanovic, "A 4Gb/s/ch 356fJ/b 10mm Equalized On-chip Interconnect with Nonlinear
Charge-Injecting Transmitter Filter and Transimpedance Receiver in 90nm CMOS Technology," IEEE ISSCC, Feb.
2009.
Sanquan Song, Byungsub Kim, and Vladimir Stojanovic, "A Fractionally Spaced Linear Received Equalizer with
Voltage-to-Time Conversion," IEEE Sympoisum on VLSI Circuits Dig. Tech. Papers, June 2009.
Ajay Joshi, Byungsub Kim, and Vladimir Stojanovic, "Designing Energy-efficient Low-diameter On-chip
Networkswith Equalized Interconnects," in Proc. of the 17th IEEE Symposium on High-Performance Interconnects,
Aug. 2009.
Yong Liu, Byungsub Kim, Timothy O. Dickson, John F. Bulzacchelli, and Daniel Friedman, "A 10Gb/s Compact
Low-Power Serial I/O with DFE-IIR Equalization in 65nm CMOS," IEEE ISSCC, Feb. 2009.
Byungsub Kim and Vladimir Stojanovic, "Characterization of Equalized and Repeated Interconnects for NoC
Applications," IEEE Design and Test of Computers, Oct. 2008.
Byungsub Kim, Yong Liu, Timothy O. Dickson, John F. Bulzacchelli and Daniel J. Friedman, “A 10-Gb/s Compact
Low-Power Serial I/O With DFE-IIR Equalization in 65nm CMOS," IEEE Journal o f Solid-State Circuits, Dec. 2009.
Byungsub Kim and Vladimir Stojanovic, “An Energy-efficient Equalized Transceiver for RC-dominant channels, "
JSSC 2010 (submitted).
Workshop

Byungsub Kim and Vladimir Stojanovic, "Energy Efficient Wireline Communication Over RC-dominant Channels,"
CMOS Emerging Technologies, Vancouver, Sep. 2009.
66
Acknowledgement
Adviser: Vladimir Stojanović
Committee: Anathan Chandrakasan, Jowel Dawson
Funding: Intel, IBM, NEC
Fabrication: TAPO, IBM Kansas City Plant
Industry: Ian Young, Alexandra Kern, Sam Palermo,
John Bulzacchelli, Dan Friedman, Yong Liu,
Timothy Dickson, Troy Beukema
Technical help: Fred Chen, Ranko Sredojević, Sanquan Song, Ajay
Joshi, Jose Bohorquez, Matt Park, Min Park,
Sungwon Chung
Entertaining help –
ISG-coffee mates: Ranko Sredojević, Ajay Joshi, Michael Georgas,
Maxine Lee
Old CSG-CAG buddies: Alfred Ng (Coolman), Jaewook Lee, Vinson
Lee (Real Crazyman), Jason Kim (Coolguy)
67