Download The Interconnect Problem: From Emerging Devices to Energy-efficient Networks

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
The Interconnect Problem: From Emerging
Devices to Energy-efficient Networks
Vladimir Stojanović
Integrated Systems Group
Massachusetts Institute of Technology
High-speed links needed everywhere
Backbone
Router
Rack
PC or
Console
MIT ISG
2
2
What makes it challenging
High speed
link chip
Attenuation [dB]
Channel response
0
9" FR4
-10
-20
-30
26" FR4
-40
-50
9" FR4,
via stub
26" FR4,
via stub
-60
0
2
source: Rambus

4
6
8
10
frequency [GHz]
Requires sophisticated equalization circuits
MIT ISG
3
3
Chip-to-chip I/O scaling problem
14
16
18
20
22
25
28
10
32
36
40
45
52
59
68
78
90
10


Technology (nm)
1
100
Energy/bit (pJ)
y = 10800x
Normalized unit to 90nm node
100
#I/O pads
Off-chip fclk
Aggr BW
Aggr BW (Fit)
-2.1
[Ken Yang, UCLA]
1000
y = 399.17x1.157
10
Erg/bit (2-PAM)
Erg/bit Trend (2PAM)
1
100
0.01
0.1
Technology (m)
1
Bandwidth need grows faster than energy/bit drops
Creates exponentially increasing I/O power consumption

In power constrained systems (like processors and anything inside the
box) – this limits the available bandwidth
MIT ISG
4
4
Parallel off-chip links
0
Tx
Data
Signal and noise spectrum [dBV]
Linear transmit equalizer
Anticausal taps
Sampled
Data
-20
-40
-60
-80
-100
Channel
0
Causal
taps
50
50
outP
outN
d
d
outN
outP
inP
inP
2
4
clk
clk
6
8
10 12
14
frequency [GHz]
Q
outP
outN
inN
I
 I thresh
2
I
 I th resh
2
clk
Q
I eq 0
pre-amp with offset



comparator
Often share clock generation and synch
Limited equalization (few taps)
Most power burned to drive the 50 Ω line

Current-mode – 200 mV swing (4 mW off 1 V supply)
–
–

Data rate independent
With receiver and pre-driver, at 10 Gb/s energy budget 500 fJ/bit
Voltage-mode – (2 pJ/bit state-of-the-art, dynamic power)
–
Can possibly scale to 500 fJ/bit, but not much further
MIT ISG
5
5
16
18
20
Convergence of platforms
Only way to meet future system feature set, design cost, power, and
performance requirements is by programming a processor array


Multiple parallel general-purpose processors (GPPs)
Multiple application-specific processors (ASPs)
IBM Cell
Intel Network Processor
1 GPP (2 threads)
1 GPP Core
8 ASPs
16 ASPs (128 threads)
18
18
18
Stripe
RDRA
M
1
PCI
64b
(64b)
66
MHz
QDR
SRAM
1
E/D Q
1 1
8 8
RDRA
M
2
RDRA
M
3
Intel®
XScale
™
Core
32K IC
32K DC
QDR
SRAM
2
E/D Q
1 1
8 8
MEv2 MEv2 MEv2 MEv2
1
2
3
4
Rbuf
64 @
128B
MEv2 MEv2 MEv2 MEv2
8
7
6
5
G
A
S
K
E
T
QDR
SRAM
3
E/D Q
1 1
8 8
MEv2 MEv2 MEv2 MEv2
9
10
11
12
QDR
SRAM
4
E/D Q
1 1
8 8
Sun Niagara
8 GPP cores (32 threads)
Intel 4004 (1971):
4-bit processor,
2312 transistors,
~100 KIPS,
10 micron PMOS,
11 mm2 chip
MEv2 MEv2 MEv2 MEv2
16
15
14
13
Picochip DSP
1 GPP core
248 ASPs
IXP280
0
S
P
16b
I
4
or
C
S
16b
I
X
Tbuf
64 @
128B
Hash
48/64/1
CSRs
28
-Scratc
Fast_wr
h
-UART
16KB
Timers
-GPIO
BootROM/Sl
owPort
Cisco CSR-1
188 Tensilica GPPs
1000s of
processor
cores per
MITdie
ISG
6
“The Processor is
the new Transistor”
[Rowen]
6
On-chip interconnect: A system perspective
FPGAs, Multi-core/Network Processors, SoCs
– using more routing resources
Xilinx Virtex 4
IBM Cell: 8 cores
Cisco CRS-1: 192 cores
Clearspeed CSX600: 64-96 cores
Power Consumption
Clock 26%
Network 36%
Computation 39%
Example MIT


Designed to relieve the interconnect scaling problem
–

RAW numbers
Existing architectures
Latency, capacitance, loss
But, interconnects will consume nearly 50% of chip power

Power and Area Efficiency becoming critical
MIT ISG
7
7
Electronic shared memory switches
Sun Niagara
IBM CELL
DIMM
DIMM
DIMM
DIMM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
interface


DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
interfaces
Intel Terascale
cross-bar
Fast, but scales poorly
Power hungry
Distributed switch (e.g. Mesh)


Slow, power hungry
Other on-chip networks – (fat tree, torus,
etc)
–

bus
Centralized switch (Bus, Crossbar)


DIMM
DIMM
DIMM
DIMM
DIMM
DIMM
DIMM
DIMM
mesh
rings
Compromise between density, power and
latency/data rate
Memory interfaces

Power and density limit
MIT ISG
8
8
The problem spans many layers
Interconnect Problem
On-chip Network
Network
Architecture
Off-chip I/O
Design
Optimization
Communication
(Eq., Mod, Coding)
2.5
Energy/Bit (pJ/Bit)
2
Equalized, 30mV Eye
Equalized, 50mV Eye
Equalized, 90mV Eye
Repeated
Link modeling,
Characterization
1.5
1
0.5
Circuits
Tx, Rx, Ctrl, Meas
clk
clk
outP
outN
outN
outP
inP
inP
Q
inN
I
 I thresh
2
I
 I th resh
2
pre-amp with offset
0
0
1
2
Data Rate Density (Gbps/um)
clk
Q
comparator
3
[IBM]
Interconnect
technology
CNTs
Si-Photonics
MIT ISG
9
Cu
9
Repeaters in On-chip Networks
Intel-80 core Terascale
Application
benchmarks,
Architectural decision,
Guo, SLIP 2006
Jin, HPCA 2007
Network
Wire Width and Space (um)
3
Eq., Width
Eq., Space
Rep., Width
Rep., Space
2.5
2.5
2
Repeated
Interconnect
1.5
1
0.5
0
0
1
2
Throughput Density (Gbps/um)
3
Circuit + wire
parameters
MIT ISG
10
Energy/Bit (pJ/Bit)
3.5
Repeated
2
1.5
1
10mm wire
0.5
0
1
2
3
10
Data Rate Density (Gbps/um)
Equalized Interconnect in On-chip Networks
Intel-80 core Terascale
Network


Models and Tools
 No performance and power
models
 No modeling/tool framework
Challenges
 Joint Optimization : Circuit +
Wire
2.5
s
p
+
Vth
-
D
-h1
+
D
+
-
2
+
width, spacing
w   w1 w2 ... wnFFE 
Vth
receive bits
D
Energy/Bit (pJ/Bit)
transmit bits
Wdrv
+
-
Equalized
,V ,VInterconnect
Equalized, 30mV Eye
Equalized, 50mV Eye
Equalized, 90mV Eye
Repeated
1.5
1
-
h1
+h1
0.5
T
0
MIT ISG 0
11
1
2
Data Rate Density (Gbps/um)
3
11
Equalized Interconnect
No Equalization
Feed Forward
Equalization (FFE)
w0
D
w1
D
w2
+
y1
w0
FFE + Decision Feed Back
Equalization (DFE)
D
w1
D
w2
+
D
+
-y1
MIT ISG
12
12

Joint Optimization Problem :
Communication + Circuits + Wires
Large design space

Circuit parameters (WLCM, Vp, Vs), wire dimensions (W,
S), sample delay (Td), equalization coefficients (W1,
W2 , W3, y1) (>400k points)
Voltages
Vs
FFE Coefficients
Vp
dnFFE
... ...
WnFFE
dk
d1
Wk
W1
...
Wire Sizes
...
Driver Size
... ...
d1
dk
dnFFE
Sample Delay
Td
ws
W1
...
GND
+
clk
Vth
-
...
Wk
-y1
receivebits
D
GND
WnFFE
+
clk
Vth
-
y1
GND
B. Kim and Vladimir Stojanovic “Equalized Interconnects for On-Chip
Networks: Modeling and Optimization Framework,” IEEE ICCAD 2007
MIT ISG
13
DFE Coefficient
13
Connecting Performance and Power Models
0.07
0.6
0.06
main tap
0.05
Voltage (V)
0.5
0.7
DFE tap
0.03
0.6
0.01
0.3
equation
spice
T(f)
0.02
0
0.2
-0.01
0.2
0.25
0.3
0.35
Time (nsec)
0
Estimated Eye
-0.1
-0.2
0
0.2
0.4
Time (ns)
0.8
0
0
Vp
2
... ...
Wk
W1
...
Wire Sizes
Interconnect
...
ws
... ...
10
Sample Delay
Td
Driver Size
dk
dnFFE
8
WnFFE
dk
d1
4
6
frequency (GHz)
FFE Coefficients
dnFFE
d1

0.1
Channel = Wire +
Devices & Circuit
Eye estimation
 Sample time
 Equalization
coefficients

0.3
Vs
Voltages
Transfer function
0.4
0.2
0.6
Need accurate
channel model
0.5
0.4
0.1
Magnitude
Voltage (V)
0.4

0.04
W1
...
GND
+
clk
Vth
-
...
Wk
-y1
WnFFE
GND
receivebits
D
GND
+
clk
Vth
-
y1
DFE Coefficient
MIT ISG
14
14
Result: M8, 10mm
Repeater and equalized
interconnect
2.5
0.3
Energy/Bit (pJ/bit)
Energy/Bit (pJ/Bit)
2
0.35
Equalized, 30mV Eye
Equalized, 50mV Eye
Equalized, 90mV Eye
Repeated
1.5
1
0.25
Worst case eye
and LMSE metod
NORM1, with crosstalk
LMSE, with crosstalk
NORM1, no crosstalk
LMSE, no crosstalk
0.2
0.15
0.1
0.5
0.05
0
0

1
2
Data Rate Density (Gbps/um)
3
0
0.5
1
1.5
2
2.5
Data Rate Density (Gbps/um)
3
Equalized Interconnect :


x10 more energy efficient for the same data rate
density and latency
Crosstalk doubles energy cost
MIT ISG
15
15
Model Verification
Energy/bit (pJ/bit)
0.35
Solid Line: This tool
0.3 Dashed Line: Spice
Ebtot
0.25
EbactiveDrv
EbPre
0.2
Ebw
0.15
EbscDrv
0.1
0.05
0
0



1
2
3
Data Rate Density (Gb/s/um)
Auto generated spice netlist
based on the parameters.
Spice : 190 sec
Formula-based : 6.5m sec
MIT ISG
16
16
On-chip Link Test-chip

90nm test chip



8 Gb/s max


4 Gb/s/um
0.5 pJ/bit @ 8 Gb/s




IBM process
DDR muxing
Eye > 100mV
400 fJ/bit Tx
100 fJ/bit TIA+DFE
Trades-off data-rate density and energy-efficiency
MIT ISG
17
17
What else can we do?

Try new device technology


Carbon Nanotubes
Silicon Photonics
MIT ISG
18
18
Estimating the impact of CNTs
Create Models for Circuits
RW L
CDRV
CLOAD
CW L
CNT Process
Characterization
P
H
CC
S
CP
CC
Extract
Tradeoff Curves
W
T
RDRV
Performance
CP
Power
Area
Make Informed
Architecture
Choices
MIT ISG
19
19
Effective Resistivity of CNTs


k 
2
RF
 Rcont 
L


L
RCONT/2
RF/2
LK
LK
RF/2
4CQ
4CQ
CE
CE
RCONT/2
Resistivity of CNTs vs. Cu
5
Cu
Ideal CNT
CNT (L=10x min. pitch)
CNT (L=100x min. pitch)
CNT (L=1000x min. pitch)
4.5
Effective Resistivity ( -cm)
 EFF 
d
4
3.5
CNT R
3
cont
= 50k
2.5
2
1.5
1
0.5
20


40
60
80
100
120
140
Technology Node (nm)
160
CNT bundles that are densely packed, fully & ideally contacted have a
resistivity ~7x lower than Cu at 22nm
Non-ideal contact resistance amortized over length of nanotube

Insignificant for lengths > ~1000 gate pitches
MIT ISG
20
20
180
Effective Resistivity: Non-idealities
CONTACTED METALLIC CNT
2

Rcont
L
L  L0 where L0  C d

,

k, Fraction of Contacted Metallic CNTs
 EFF 
 h
k  4e 2 L0
d
1
Resistivity Contours, CNT/Cu (22nm)
0.9
0.8
CNT
/
Cu
(22nm)
0.7
0.25
0.6
33%
Metallic
0.5
0.5
0.4
0.3
1.0
0.2
0.1
0.5


0.125
1
1.5
d, CNT diameter (nm)
2
Current growth limitations result in only ~2x lower resistivity than copper
MIT ISG
21
21
Scaling Interconnect: Design Space
T/SH
P=S+W=constant
P
W- W
S+ W
CP
Copper OR CNT Vias
Consider rescaling the interlayer dielectric (ILD) stackup



OR
CC
H*SH

OR
Copper OR CNT
Interconnects
Cu
Scale width: maintain minimum wire pitch
Increase ILD height (H), decreasing wire thickness (T): maintain
constant wire bandwidth
Integration of vertically aligned CNTs closer to realization
MIT ISG
22
22
CNTs Require Rescaling
1.8
1.6
Optimal W (in WMIN)
1.4
1.2
Min. Width in Cu
1
0.8
Cu (45 nm)
Cu (22 nm)
CNT (45 nm)
CNT (22 nm)
0.6
Range of sub- 0.4
optimal CNT wire
lengths @ 22nm 0.20

500
1000
1500
2000
2500
Wire Length (in min. wire pitch)
Range of suboptimal CNT
wire lengths @
45nm
FO=3
3000
3500
Range of sub-optimally sized wires is greater if CNTs are
used with the same cross-section as copper.
MIT ISG
23
23
Energy vs. Delay: W & H Scaling
6
Cu Vias +
Cu Wires
FO = 0.5
Cu, W=1, H=1
5.5
CNT Vias +
Cu Wires
FO = 8
CNT, W=1, H=1
5
Energy (fJ)
Cu Vias +
CNT Wires
increasing FO
CNT, W=0.5, H=1
4.5
Cu, W=1, H=2
4
CNT Vias +
CNT Wires
CNT, W=0.5, H=2
Cu, W=1, H=4
3.5
Min. Delay H
Min. Energy H
CNT, W=0.5, H=4
3

0
20
40
60
80
Delay (ps)
100
120
140
Scaling both width and height result in almost 33%
energy savings for the same delay
F. Chen, et al “Scaling and Evaluation of Carbon Nanotube Interconnects
for VLSI Applications,” ACM Nanonets Symposium 2007
MIT ISG
24
24
Characterization Limitations
Measurement Limitations:
• Large impedance mismatch
VNA
Z0=50
- measurment variance > than
measured value
• Large test parasitics
Z0~20K
- limit bandwidth of
input signals
- need to de-embed,
much larger than CNT
Physical Limitations:
• Cumbersome, difficult, unrepeatable test setups
• Limited CMOS integration
• Limited number of measurements
MIT ISG
25
25
CNT Characterization Test-chip

Horizontally grown SWCNTs




Grow CNTs on separate
substrate
Transfer to testchip using resist
Use ball bonder, or metal
deposition to make final pad
contact
Vertically grown CNTs



Create compatible CNT masks
that align with test interface
Create contacts at the end of
CNT chips
Align & mate using optical IR
aligner
Metallic Underlayer
Metal Cap
PASSIVATION
Metal Alloy
UBM
PAD METAL
PASSIVATION
Metal Alloy
PAD METAL
CNT
Metallic Underlayer
Metal Cap
CNTs
Insulating
Scaffold
MIT ISG
26
26
UBM
Extracting CNT Data

Any pair of transceivers acts like a bidirectional link/equivalent time scope
R~20K

Sweep time & voltage to measure
step response waveforms
Vref
Vref

Vref
Vref
Extract S-parameters, delay, edge
rates of both CMOS & CML signals
MIT ISG
27
27
What else can we do?

Try new device technology


Carbon Nanotubes
Silicon Photonics
MIT ISG
28
28
Manycore interconnect bottlenecks
P
R
Electrical on-chip network
Electrical memory interface
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DIMM
DIMM
DIMM
DIMM
Processor + Router
Request
P
Processor
Router
Memory
Controller
P
P
P
P
P
P
P
P
P
P
P
P
P
DIMM
DIMM
DIMM
DIMM
DRAM
1024
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
Response
Proc
Request
Network
Proc
Response
Network
DRAM
DRAM
DRAM
DRAM

DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
P
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
P
DIMM
DIMM
DIMM
DIMM
DIMM
DIMM
DIMM
DIMM
Logical View
P

Slow on-chip mesh
On-chip memory controllers



Power and density limited memory BW
At most 3-6 Tb/s in next few years
Need to move them off-chip
–
MIT ISG
29
Use Si-Photonics
29
Optical system architecture
Physical View
Likely Bottleneck!
DIMM
Optical
Acces Point
DIMM
Proc
Req
Net
DIMM
Waveguides

Hub chip with
memory controllers
Hub
Req
Net
Hub
Resp
Net
Hub
Req
Net
Hub
Resp
Net
Hub
Req
Net
Hub
Resp
Net
Hub
Req
Net
Hub
Resp
Net
Proc
Resp
Net
Message path







Core
Waveguides
Logical View
Electrical mesh to reach the appropriate access point
Core waveguide to the switch matrix
Statically routed through the switch matrix
DIMM waveguide and optic fiber to reach hub chip
Routed through hub chip to correct DRAM chip
Returns through the separate response networks
Si-photonic network



Faster, more energy-efficient communication across the chip
Exports memory controllers and switching to DRAM hub chips
Overcomes power and density limitations on memory bandwidth
MIT ISG
30
30
Optical multi-group system architecture
Physical View
Optical
Network
Group 0
Grp0
Proc
Req
Net
Grp1
Proc
Req
Net
Group 1

Multi-group architecture


–
–

Hub
Req
Net
Hub
Resp
Net
Hub
Req
Net
Hub
Resp
Net
Hub
Req
Net
Hub
Resp
Net
Hub
Req
Net
Hub
Resp
Net
Can help alleviate on-chip interconnect bottleneck
Potentially
Breaks the single on-chip electrical mesh into several groups
–

Logical View
Grp0
Proc
Resp
Net
Grp1
Proc
Resp
Net
40-80 Tb/s
Each with its own smaller mesh
Each group still has one AP for each DIMM and thus can access all of memory
Since there are more APs each AP is narrower (uses less λs)
Uses optical network as a very efficient global crossbar
Hub networks now include a crossbar and arbitration
MIT ISG
31
31
Traffic generator modeling GUPS access pattern
Average Latency per Request (cycles)
6 GUPS
17 GUPS
For rough comparison the Cray
X1E with 248 cores (1.13GHz)
sustains 1.8 GUPS, while the
Cray Red Storm/XT3 cluster
with 12,960 cores (2.4GHz)
sustains 30 GUPS
Approximately
100 Giga-Updates
Per Second (GUPS)
http://icl.cs.utk.edu/hpcc
Electrical (3 Tb/s Mem BW)
256 Cores
Electrical (6 Tb/s Mem BW)
Optical with 16 Groups
Total Offered Bandwidth (Bytes/cycle)
MIT ISG
32
32
PD
(0.1 dB)
64 Rx
(0.01 dB/ring)
63 Tx-Rx
pairs
(through loss
0.01 dB/ring)
Drop
(2.5 dB/drop)
Coupler
(1 dB)
SM fiber 10 cm
Memory – buffer chip
2 cm (1 dB/cm)
64 Crossings
(0.05 dB/cross.)
64 Tx
(0.01 dB/ring)
Insertion
(1 dB)
~ 0.9 cm (1 dB/cm)
64 x λ
Coupler laser source
(1 dB)
64 x λ
laser source
Link components
CouplerMIT ISG
(1 dB) 33
Core
1:32 splitter
(0.2 dB/split)
Coupler
(1 dB)
64 filter pairs
(through loss
0.01 dB/ring)
Drop
(2.5 dB/drop)
2 cm
Multi-core chip
33
Optical power budget
Component
Preliminary Power
Design
loss
Optimized
Design
Power
loss
Coupler loss
Splitter loss
1 dB/coupler
0.2 dB/split
3 dB
1 dB
1 dB/coupler
0.2 dB/split
3 dB
1 dB
Non-linearity
1 dB
1dB
1dB
1dB
Through loss
0.01 dB/ring
Modulator Insertion loss
1 dB
Crossing loss
0.2 dB/crossing
3.17 dB
1 dB
12.8 dB
0.01 dB/ring
0.5 dB
0.05 dB/crossing
3.17 dB
0.5 dB
3.2 dB
On-chip waveguide loss
5 dB/cm
Off-chip waveguide loss 0.5e-5 dB/cm
Drop loss
2.5 dB/drop
Photodetector loss
0.1 dB
Receiver sensitivity
-20 dBm
20 dB
~ 0 dB
5 dB
0.1 dB
-20 dBm
26.07 dBm
(0.40 W)
1 dB/cm
0.5e-5 dB/cm
1.5dB/drop
0.1 dB
-20 dBm
4 dB
~ 0 dB
3 dB
0.1 dB
-20 dBm
-1.03 dBm
(0.78 mW)
Power per wavelength
Power required at
source
3.3 kW
MIT ISG
34
6.38 W
34
Data transmission latency
Component
Latency
Serializer/Deserializer
(50ps each)
Modulator driver latency
Through latency
(2.5ps/adjacent channel)
Drop latency
(20ps/drop)
Waveguide latency
(106.7ps/cm)
SM fiber latency
(48.3ps/cm)
Photodetector+TIA latency
Total latency

50ps
108ps
7.5ps
60ps
427ps
483ps
200ps
1.385ns
Total latency 14 bit times


Less than 4 clock cycles (with 2.5 GHz core clock)
Almost equally split among
–
Fiber, waveguide, and Tx/Rx circuits
MIT ISG
35
35
Electrical back-end
Global clock (load + source) / link
4 fF (2 fJ/bit) optical
40 fF (20 fJ/bit) PLL
20 fJ/bit (6 bit DC dac)
5-10 GHz
Local clock
16 fF (8 fJ/bit)
20 fJ/bit
10 fF (2.5 fJ/bit)
40 fJ/bit
60 fF (15 fJ/bit)
10 fF (2.5 fJ/bit)
40 fF (20 fJ/bit)


Power < 250 fJ/bit
Area 0.02% for 4000 links


MIT ISG
36
~200 cells (0.5um x 0.2 um)
20 (um)2 per link
36
MIT Eos1 65 nm test chip
Texas Instruments standard
65 nm bulk CMOS process
First ever photonic chip
in sub-100nm CMOS
Automated photonic
device layout
Monolithic integration with
electrical modulator drivers
2 mm x 2 mm die
16 ring modulators
8 modulator drivers
84 ring filters
~10 cm of waveguides
waveguide crossings
8 MZ modulators
12 photo detectors
MIT ISG
37
37
Two-ring filter
Ring modulator
Digital driver
Vertical coupler grating
One-ring filter
Photo detector
Paperclips
Waveguide crossings
M-Z test structures
4 ring filter banks
MIT ISG
38
38
65 nm lithography great for rings !
Ring filter layout in Cadence
286 nm
266 nm
300 nm
6.5 m
Poly-Si heater
Poly-Si ring
SEM photos after dielectric etch
MIT ISG
39
39
Frequency response of 4-channel filter
FSR is ~16nm (2.7THz)
The scans are
1268nm - 1276nm
at 0.05nm intervals
120 GHz spacing
5nm ring radius step
Sensitivity
1THz/nm poly H
30GHz/nm poly W
cleave
4 3 2 1
image window
input
First 100 GHz spaced bank in sub-100nm, first in bulk CMOS, first in poly Si
- Enables 60-120 wavelengths/waveguide ( >200 Gb/s/um data rate density)
MIT ISG
40
40
Conclusion


Interconnects are very complex microcommunication systems
Cross-layer design approach is needed to solve
the on-chip and off-chip interconnect problem

Equalized interconnects extend copper
–

CNTs further improve the performance by 2x
–

10x in energy-efficiency with proper circuit-wire co-design
With proper bundle resizing and ILD scaling
Si Photonics improves the throughput by 10-20x
–
–
Unifies on-chip and off-chip interconnects
Requires a different network architecture (electrical for
switching, optical for energy-efficient transport)
MIT ISG
41
41
Acknowledgments





Franz Kaertner, Rajeev Ram, Judy Hoyt, Krste
Asanovic, Henry Smith, Erich Ippen, Martin Schmidt
Jason Orcutt, Anatoly Khilo, Ben Moss, Charles
Holzwarth, Hanqing Lee, Christopher Batten, Ajay
Joshi, Fred Chen, Byungsub Kim
DARPA UNIC program – PM Jag Shah
Texas Instruments – Dennis Buss and Tom Bonifield
FCRP Interconnect Focus Center
MIT ISG
42
42
Related documents