Download ckchengui - UC San Diego

Document related concepts
no text concepts found
Transcript
Interconnect Centric VLSI
Design Automation
Chung-Kuan Cheng
CSE Department
UC San Diego
La Jolla, CA 92093-0404
[email protected]
1
Interconnect Dominance
200
1mm Global Interconnect with Scattering
(source: ITRS Roadmap 2004)
Delay (ps)
160
120
FO4 Inverter Delay (Estimated by
0.36*Ldraw)
1mm distortionless Transmission Line (Speed
of Light)
80
40
0
180 150 130 100
90
80
70
65
57
50
Process Technology Node (nm)
Goals: Speed, Power, Cost
Constraints: Area, Current, Skew
Challenges: PVT Variations, Signal Integrity
2
Outlines
• Interconnect Technologies
• Buffers, Pitches, Circuit Styles
• Geometric Planning
• Wire Orientations, Chip Shapes
• Interconnect Networks
• Topologies, Wire Styles
• Power and Ground Distributions
• Clock Networks
• Functional Modules
• Adders, Shifters
• Packaging
• Conclusion
3
Interconnect Technologies
• RC Wires
• Wire Pitch, Width, Separation
• Buffer Size, Buffer Interval
• Transmission Line
• RLCG
4
Interconnect Technologies
Year (On-Chip)
rncn (ps)T40
rwcw (ps/mm)T80
metal 1/global
interval (mm)
delay (ps/mm)
Int 
2(1  f )rncg
rwcw
,
2005
SRC Roadmap 2005
2010
2015
0.870
0.400
0.180
440
110
1792
523
5951
1601
0.136
0.272
120
60
0.0457
0.0846
164
89
0.0168
0.0324
200
104
Delay (ltr ) / ltr  (2  2(1  f ) ) rn rwcg cw
5
Interconnect Tech.: Transmission Line
• Speed-of-the-light on-chip communication
• < 1/5 Delay of Traditional Wires
• Low Power Consumption
• < 1/5 Power Consumption
• Robust against process variations
• Short Latency
• Insensitive to Feature Size
6
Interconnect Tech.: Transmission Line
Differential Transmission Line
i(z,t)
i(z,t)
RΔl
RΔl
Surfliner
LΔl
LΔl
RΔl
CΔl
LΔl
RΔl
LΔl
CΔl
Serial
resistance causes
voltage loss.
Speed
and attenuation are
frequency dependent.
…
RΔl
RΔl
LΔl
GΔl
LΔl
RΔl
CΔl
RΔl
LΔl
GΔl
CΔl
…
LΔl
Shunt
conductance compensates
voltage loss: R/G = L/C.
Flat
from DC Mode to Giga Hz
Telegraph
Cable: O. Heaviside in 1887.
7
Theory (Telegrapher’s Equation)
• Telegrapher’s equation:
dV ( z , t )
dI ( z , t )
  RI ( z , t )  L
dz
dt
dI ( z , t )
dV ( z , t )
 C
 GV ( z , t )
dz
dt
• Propagation Constant:
  ( R  jL)(G  jC)    j
• Wave Propagation:
V ( z )  V0 e
  z  j z
• Alpha and Beta corresponds to speed and phase
velocity. Both are frequency dependant
8
Theory (Distortionless Line)
• Set G=RC/L
• Frequency Independent speed and attenuation:
  R/ L/C,
   LC
• Characteristic impedance: (pure resistive)
Z0  L / C
• Phase Velocity (Speed of light in the media)
v  1 / LC  c
• Attenuation:
A( z )  e

R
z
Z0
9
Digital Signal Response
10
Interconnect Tech.: Transmission Line
• Add shunt conductance between differential
wires
• Resistors realized by serpentine unsilicided
poly, diffusion resistors, or high resistive metal
11
Geometrical Planning
• Wire Orientations
• Manhattan, Hexagonal, Octagonal, Euclidean
• Die Shapes
• Rectangle, Diamond, Hexagon, Octagon, Circle
12
Average Radius of Unit-Circle Area
lambda geo. Man. Y-Arch X-Arch Euclid.
Shape
Square
1.329 1.122 1.070 1.017
Diamond
1.253
1.121
1.070
1.017
Hexagon
1.276
1.100
1.058
1.003
Octagon
1.272
1.104
1.054
1.001
Circle
1.273
1.103
1.055
1.000
13
Throughput : concurrent flow demand
lambda geo. Man.
Shape
M: Square
1.000
M: Diamond
Y: Hexagon
X: Octagon*
Y-Arch
1.225
X-Arch*
1.346
1.195
1.315
1.420
*ratio of 0-90 planes and 45-135 planes is not fixed
14
Flow congestion map for uniform 90
Degree meshes
15
Congestion map of square chip using X-architecture
12 by 12
13 by 13
16
Y-architecture + Square Chip
12 by 12
13 by 13
17
Y Architecture + Hexagonal Chip
18
X-Architecture + Octagonal Chip
19
Manhattan Architecture + Diamond Chip
20
Routing Grids
X-Architecture
Y-Architecture
(http://www.xinitiative.org/img/062102forum.pdf)
21
Interconnect Networks
• Optimized Interconnect Architecture
• Data Bus, Control Signals
• Shared Interconnect
• Packet Switching
• Circuit Switching
• RTL Level Partition
22
Interconnect Networks
On-chip transmission line
for long distance
communication
Physical
Implementation
Well spaced RC wire with buffer
insertion for local connections
23
Interconnect Networks
• Obj: Power, Latency
• Constraints:
• Routing Area, Bandwidth
• Design Space:
• Topology
• Wire Styles, Switches
• Model:
• Traffic Demand
• Data Bus, Control Signals
24
Interconnect Networks: Design Flow
Multi-commodity
network flow (MCF)
formulation
Power &
Delay Lib
Topology
Lib
Power Evaluation
(MCF solver)
Latency Aware low
power NoC topology
with wire style
optimization
25
Power and Latency Tradeoffs for
Optimal 8x8 Topologies
Power Consumption (W)
area=3000um
area=6000um
area=9000um
area=4000um
area=7000um
area=10000um
area=5000um
area=8000um
area=11000um
73
68
63
58
53
48
2.3
2.5
2.7
2.9
Average Latency (ns)
3.1
3.3
26
Topology Selection (Latency, Power, BW)
Power Consumption (W)
topo=optimal
topo=mesh
topo=torus
topo=hypercube
98
88
78
68
58
48
2.3
2.8
3.3
3.8
Average Latency (ns)
4.3
27
Topology Selection (Latency, Power, BW)
(a) Optimal topology when area = 3000um
(b) Optimal topology when area = 7000um
(c) Optimal topology when area = 11000um
Optimal 8-node topologies vs area resources
28
Power Ground Networks
• Power network on chip and package
• Decoupling capacitor
• Conducting transistors connected to the P/G
grids
29
Power Ground Analysis
•
Constraints: Voltage Drops and Current
Density
IR Drop: Static Analysis
•
•
•
•
Resistive Networks
Static Current Sources
dI/dt: Dynamic Analysis
•
•
•
•
•
RLC Networks
Power on and off
Sleep mode on and off
Gated clocks
Various operation modes
30
Power/Ground Networks:
Natural Frequency
• RLC Network Characteristics:
Natural frequencies (Quality
Factor)
• Operation Modes: Excite the
resonance
• Decoupling Capacitance: Shift
the natural frequencies.
31
H. Chen, IBM
32
Clock Distributions
33
Clock: Linear Variations Model
• Process variation model
• Transistor length
• Wire width
• Linear variation model
d  d0 k x x k y y
• Power variation model
• Supply voltage varies randomly (10%)
34
Clock: RC Model
Input: an n level meshes and h-trees
Constraint: routing area, parameter variations
Objective: skew
35
Simplified Circuit Model
Vs1
1
Rs
u(t)
C
R
Vs2
u(t-T)
Rs
2
C
36
Skew Expression
Assumptions:
1. T<<RsC
t 0  ln 2 R1C1
2. Rs /R <<RsC/T
Using first order
Taylor expansion
1 V V V V
T  ( 1 . 2  1 . 2 )
2 V
V
1
2
ex=1+x,
Skew function :
Rs
T  T exp( 2 ln 2  )
R
37
Optimal Routing Resources Allocation
total area
level-4
level-3
level-2
level-1
38
Inductance Diminishes Shunt Effects
Vs1
Rs
1
u(t)
C
L
R
Vs2
• 0.5um wide 1.2
cm long copper wire
• Input skew 20ps
Rs
2
u(t-T)
C
f(GHz)
0.5
skew(ps) 3.9
1
1.5
2
3
3.5
4
5
4.2
5.8
7.5
9.9
13
17
26
39
Clock Distributions
Surfliner
40
Functional Modules (Data Path)
•
•
•
•
•
Arithmetic
Algorithms
Logic
Logic Styles
Placement
41
Cyclic Shifter
Delay: nlogn, Power: ¼ n2
D6
D7
(7,0)
S0
S1
S2
0
(6,0)
1
(7,1)
0
1
(7,2)
0
1
1
(6,1)
1
(6,2)
0
1
0
1
D3
D4
(4,0)
(5,0)
0
0
D5
(3,0)
0
1
0
D2
(2,0)
1
0
(1,0)
1
(5,1)
(4,1)
(3,1)
(2,1)
0
0
0
0
1
(5,2)
0
1
(7,3)
(6,3)
(5,3)
Z7
Z6
Z5
1
(4,2)
0
1
1
(3,2)
0
1
1
(2,2)
0
1
D0
D1
0
(0,0)
1
(1,1)
0
1
(1,2)
0
1
0
1
(0,1)
0
1
(0,2)
0
1
(4,3)
(3,3)
(2,3)
(1,3)
(0,3)
Z4
Z3
Z2
Z1
Z0
42
Fan-out Splitting
D7
D6
S0
(7,0)
(6,0)
S1
(7,1)
S2
(7,2)
0
0
1
0
(6,1)
0
0
0
(5,3)
0
τ
0
0
0
(3,3)
0
0
0
0
0
Z1
Cg
6000
484
Impr. Ratio
8-bit 10.7%
16-bit 26.0%
32-bit 41.5%
64-bit 53.7%
224
218.7
200
128
102.7
100
76
50.7 45.3
Total Switched Capacitance
Delay
300
1
Z0
Delay Comparison
MuxShifter
DemuxShifter
400
1
(0,3)
600
500
Delay: n
Power: 3/16 n2
(0,2)
1
(1,3)
1
(0,1)
1
(1,2)
1
Z2
(0,0)
1
(1,1)
1
(2,3)
Z3
Z4
0
(2,2)
0
1
D0
(1,0)
1
(2,1)
1
(3,2)
1
D1
(2,0)
0
1
(3,1)
1
(4,3)
Z5
Z6
Z7
0
(4,2)
1
D2
(3,0)
1
(4,1)
0
1
(5,2)
1
(6,3)
(4,0)
0
1
(5,1)
1
(6,2)
1
(7,3)
0
D3
D4
(5,0)
1
0
1
0
D5
Power Comparison
MuxShifter
DemuxShifter
5000
4000
3000
5220
4458.9
Impr. Ratio
8-bit 2.5%
16-bit 6.3%
32-bit 10.5%
64-bit 14.6%
2000
1561.3
1397.6
1000
484 453.7
150.7 146.9
0
0
8-bit
16-bit
32-bit
64-bit
8-bit
16-bit
32-bit
64-bit
43
Cell Permutation
• Fix the input/output stage, permute cells of intermediate
stages to further improve delay.
• Formulate as an ILP problem and solve by CPLEX.
Optimal Timing solution for 8-bit
7 6 5 4 3 2 1 0
0
0
0
0
0
0
0
0
>> 1-bit
1 0 1 0 1 0 1 0 4 3 0 1 0 1 0 1
6 5 4 3 7 2 1 0
1
1
1
1
4
1
1
1
>> 2-bit
4 2 2 2 4 1 4 5 4 3 5 5 1 4 1 3
3 4 2 6 7 5 1 0
4
2
4
5
4
5
4
3
>> 4-bit
8 4 7 5 8 8 4 7 8 4 7 7 4 6 3 8
7 6 5 4 3 2 1 0
8
7
8
7
8
7
6
Delay (in unit of columns spanned)
• Optimal solution in terms of delay
• Delay/Power tradeoff
Additional Delay Reduction
by Cell Permutation
140
120
125
w/o permuatation
w/ permutation
100
78
80
61
60
38
40
20
29
13
18
8
0
8-bit
16-bit
32-bit
64-bit
8
44
Packaging: Pin Breakaway
• Row by Row Escape:
Escape interconnect row by row from outside toward inside.
45
Packaging: Escape Sequence Strategies
• Parallel triangular sequence:
This method divides the objects into groups
and escape each group with a triangular
outline.
46
Packaging: Escape Sequence Strategies
• Central triangular sequence:
Escape objects from the center of the outside row and
expand the indent with a single triangular outline. In this
method, the outline capacity is increased continuously
layer by layer while the first several layers is small.
47
Packaging: Escape Sequence Strategies
• Two-sided sequence:
Escape objects from the inside as well as from the outside.
The outline shrinks slowly and also follows zigzag shape.
48
Packaing: Escape Sequence Strategies
49
Packaging: Experimental Results
40 x 40
304
272
Parallel
triangular
276
340
Central
triangular
100
156
3
4
240
208
292
240
228
300
308
324
5
6
7
8
9
10
176
144
112
80
48
16
164
124
164
372
444
328
Layer
Row by row
1
2
Two sided
312
328
50
V.3. Experimental Results
Layer
Parallel
Row by row
triangular
Central
triangular
20 x 20
Two
sided
1
2
144
112
132
144
92
116
140
160
3
80
96
140
100
4
5
48
16
28
52
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
Conclusion
•
•
•
•
•
•
•
Interconnect Technologies
Geometric Planning
Interconnect Planning
Power and Ground Distribution
Clock Networks
Data Path
Packaging
68
Interconnect Technologies
Coupling capacitance:
t
h
h
t




t
2.5 s
0.31s
0.08 s
1.3 s
 1  1.5e e
 1.5e
 0.13e

 s

Cca
w

  0.65s h
h 
0.35 h
   1.53  0.98e
 0.01
e

s 

0.2
Ccfr
Self capacitance:
t
s



  s 
s
1.2 h
 1.05  0.63e  e




  s  2h 
Csfr
Csa


0.05
t
 
h
0.25
 0.063
w
h
Total wire capacitance:
Cca


Ccfr


Csa


Csfr

Cc
s
t
Cw 
h
Cs
w
d
Cs
h
w: wire width
s: wire spacing
t: wire thickness h: distance to p/g plane
Cs: self capacitance
Cc: coupling capacitance
p/g plane
wire
d: wire pitch
69
Interconnect Technologies
Design metrics: delayn
Objective
functions:
delayn
bandwidth
delayn  powern
bandwidth / power
delayn2  powern
For each pitch,
For each objective function
Find wire width w, buffer size sinv
and buffer interval linv
Calculate the metrics
70
Experimental Results – normalized delay
-7
1.8
x 10
x 10
min-d
min-ddp
min-dp
180nm
130nm
100nm
70nm
-7
1.6
2
1.4
1.2
n
1
1
0.8
0.5
0.6
0
0
0.4
0.5
1
-6
x 10
pit
ch
(m
1.5
)
200
150
100
0.2
1
50
2
3
4
5
6
pitch(m)
7
8
9
10
-7
x 10
ode
techn
-7
2
x 10
min-d
min-ddp
min-dp
1.8
Bottom right: at 70nm
technology
1.6
n
Top left: Overview
Top right: at different pitches
(solid lines: min-pitch; dash
lines: saturating pitch)
delay (s/m)
n
delay (s/m)
1.5
delay (s/m)
min-d
min-ddp
min-dp
1.4
1.2
1
0.8
0.6
0.4
0
0.2
0.4
0.6
pitch(m)
0.8
1
1.2
-6
x 10
71
Interconnect Tech. -- bandwidth
13
5
x 10
min-d
min-ddp
min-dp
180nm
130nm
100nm
70nm
13
x 10
4.5
4
bandwidth(bits/s)
min-d
min-ddp
min-dp
3
2
4
3.5
3
1
0
0.5
-6
x 10
1
pitch
(m )
2.5
1.5
50
2
2.5
100
150
1.5 200
3
pitch(m)
3.5
4
4.5
-7
x 10
e
nod
tech
13
5
x 10
min-d
min-ddp
min-dp
4.5
Top left: Overview
Top right: at min-pitches
Bottom right: at 70nm
technology
4
bandwidth(bits/s)
bandwidth(bits/s)
5
3.5
3
2.5
2
1.5
1
0
0.2
0.4
0.6
pitch(m)
0.8
1
1.2
-6
x 10
72
Interconnect Tech. – bandwidth/power
23
5
x 10
min-d
min-ddp
min-dp
180nm
130nm
100nm
70nm
4.5
23
min-d
min-ddp
min-dp
4
3
2
4
3.5
3
2.5
2
1.5
1
1
0.5
1
0
0
2
3
4
50
0.5
x 10
-6
1
pitch
(m )
1.5 200
100
150 node
h
tec
5
pitch(m)
6
7
8
x 10
min-d
min-ddp
min-dp
4.5
Top left: Overview
Top right: at different pitches
(solid lines: min-pitch; dash
lines: optimal pitch)
Bottom right: at 70nm
technology
9
-7
x 10
23
5
bandwidth/power(bits*m/Js)
bandwidth/power(m/Js)
5
bandwidth/power(bits*m/Js)
x 10
4
3.5
3
2.5
2
1.5
1
0.5
0
0.2
0.4
0.6
pitch(m)
0.8
1
1.2
-6
x 10
73
Interconnect Technologies
Example: w= 85nm, t= 145nm
rn= 10Kohm,cn=0.25fF,cg=2.34xcn=0.585fF
rw=2ohm/um, cw=0.2fF/um
Optimal interval
l
2(1  f )rn cg
rwcw
 242m
Optimal buffer size
Optimal delay
rn cw
s
 41
rwc g
Delay (ltr ) / ltr  (2  2(1  f ) ) rn rwcg cw  194 fs / m  194 ps / mm
74
Experiments—Optimized Skew
total area
0.00
0.25
0.40
1.00
3.00
5.00
s-mesh(s)
2.92E-11
2.79E-11
2.71E-11
2.42E-11
1.70E-11
1.24E-11
skew
m-mesh(s)
ratio
2.92E-11 100.0%
2.60E-11 93.2%
2.45E-11 90.4%
1.98E-11 81.8%
1.24E-11 73.2%
8.72E-12 70.5%
75
Robustness Against Supply Voltage Variations
mutli-level mesh
total area
ave
worst
0.00
2.10E-11 2.91E-11
1.00
8.38E-12 1.14E-11
2.00
2.71E-12 4.42E-12
3.00
1.89E-12 3.33E-12
4.00
1.45E-12 2.48E-12
5.00
1.16E-12 2.02E-12
single-level mesh
ave
worst
2.10E-11 2.91E-11
8.26E-12 1.43E-11
6.18E-12 1.11E-11
4.83E-12 8.73E-12
3.88E-12 6.96E-12
3.18E-12 5.64E-12
76