Download talk

Equalized On-chip Interconnect: Modeling, Analysis, and Design Byungsub Kim Integrated Systems Group Massachusetts Institute of Technology Advisor: Vladimir Stojanović 1 Interconnects dominate performance and power IBM blue gene/L racks 2 Interconnects dominate performance and power IBM blue gene/L racks Cables 3 Interconnects dominate performance and power IBM blue gene/L Cables and more hidden 4 Interconnects dominate performance and power IBM blue gene/L racks Cables node cards 5 Interconnects dominate performance and power IBM blue gene/L racks Cables Backplane interconnect node cards 6 Interconnects dominate performance and power IBM blue gene/L racks Cables node cards compute cards 7 Interconnects dominate performance and power IBM blue gene/L racks PCB wiring pattern Cables node cards compute cards 8 Interconnects dominate performance and power IBM blue gene/L racks power 440 soc Cables node cards compute cards 9 Interconnects dominate performance and power IBM blue gene/L racks on-chip wires power 440 soc Cables node cards compute cards 10 What are these interconnects?: RC-dominant wires Many core processor networks (Tilera 64core) Packages: silicon carrier (Patal DAC06) PCB wires   Short distance (1-2cm, 2-4cm, <10cm), high wire density, large channel loss Power and metal area budget 11 Increasing demand for global interconnects   Mesh – cheap local interconnect but low performance Higher radix networks – expensive global interconnect but high performance 12 Interconnects in NoC design hierarchy Power Kim ICCAD07 Terascale NoC topologies Rep. . Eq. Offered BW NoC trade-offs 2 Interconnect 1 0 1 Throughput Density 2 3 ( Gbps / um ) Circuit + wire parameters [ 9 ] Link parameter  2. 5 Eq . , Width Eq . , Space Rep . , Width Rep . , Space Energy/Bit (pJ/Bit) Wire Width and Space (um) N oC 3 Offered BW Vp D Vp D Vp Vs + - Wth + - w2 Vth - y1 Sp 1 D clk + Vs d̂ i + w0 Vs w1 1. 5 1 clk WLCM + - 2 0. 5 Td di Equalized , 30 mV Eye Equalized , 50 mV Eye Equalized , 90 mV Eye Repeated 0 Vth - -y1 0 0 1 Data Rate Density 2 ( Gbps / um ) 3 Trade - off curve of interconect [ 9 ] Link trade-off wire, circuit parameter  Interconnect metrics  NoC 13 trade-offs Thesis contributions  Fast CAD tool for link design space exploration (Kim ICCAD07) – joint wire + circuit optimization  Charge Injection FFE + Pre-distortion (Kim ISSCC09)  Trans-impedance receive amplifier  Infrastructure for in-situ signal and energy characterization of the on-chip links (KimJSSC10) 14 Modeling and analysis many core processor Power, latency, Rep.. Offered bandwidth Eq. Power NoC topologies, VC buffer size, Channel width… N oC 2.5 Energy/B it (pJ/Bit) Wire Width andSpace (um) Offered BW Data rate density (Gb/s/um) Energy/bit (pJ/b) Latency (ps) Interconnect topologies, Wire width & pitch, 3 2Transistor size Interconnect Eq., Width Eq., Space Rep., Width Rep., Space di 1 Offered BW Td Vp 0 1 2 3 Throughput Density (Gbps/um) D Circuit+ wire D Vp Vp Vs + - w0 + - w1 + - w2 Wth Vs - y 1 Sp 1.5 1 1 D interconect 0 0 1 2 Data Rate Density (Gbps/um) 3 clk + Vs d̂ i + Vth 2 0.5 clk WLCM Equalized , 30 mV Eye Equalized , 50 mV Eye Equalized , 90 mV Eye Repeated 0 Vth - -y1 15 Review: Equalization No equalization Feed forward equalization (FFE) w0 D w1 D w2 + y1 FFE + decision feedback equalization (DFE) w0 D w1 D w2 + D + -y1 16 Equalization versus repeater w0 voltage swing D w1 D w2 + + -y1 Vdd channel attenuation distance  … D distance Equalization  Faster data rate  Lower latency  Even lower power consumption by voltage swing reduction 17 Joint wire + circuit optimization Vs Voltages Vp FFE Coefficients dnFFE WnFFE ... ...  Link = Driver (impedance) + Wire + Receiver (impedance) Optimize for power and performance dk d1 Wk W1 ... Wire Sizes ... Driver Size d1 dk dnFFE Sample Delay Td ws ... ...  W1 ... GND + clk Vth - ... Wk -y1 WnFFE GND receivebits D GND + clk Vth - y1 DFE Coefficient 18 Channel model – channel transfer function Vs Voltages Vp FFE Coefficients dn FFE ... ... W nFFE dk d1 Wk W1 ... ... Driver Size Td ws ... ... d1 dk dn FFE W1 ... GND + clk Vth - ... Wk -y1 W nFFE GND  Sample Delay Wire Sizes receivebits D GND RLGC + clk Vth - y1 DFE Coefficient Set telegrapher’s equation with Thevenin model 19 Channel model – comparison to SPICE simulation Closed form solutions for T(f) and Tc(f)  l j RoCo Exponential form: small 2e T    tap count FFE&DFE  1 1   Zc    Z L     R  Z ()  Impact of the impedance c  s    Good match with SPICE simulation 20 CAD flow Technology information Transistor: spice model Wire: metal conductance, dielectric constant, etc. 21 CAD flow R, C model for LCM & Inverter Circuit Model Normalized R(Ohm-um), C(fF/um) switch model database Linearized RC swtich extraction Circuit type: LCM|Inverter, WLCM, Vs, Vp Technology information Transistor: spice model Wire: metal conductance, dielectric constant, etc. Circuit type: LCM | Inverter WLCM, Vs, Vp 22 CAD flow  ro R  rc  go G  gc rc  lo lc  , L ,   ro  lc lo  gc  co cc  , C   go  c c  c o RLGC parameters Wire Model 2D RLGC matrices database Wth, Sp y 2D field solver GND R, C model for LCM & Inverter Circuit Model Normalized R(Ohm-um), C(fF/um) switch model database Linearized RC swtich extraction Circuit type: LCM|Inverter, WLCM, Vs, Vp Technology information Wth, Sp w S id pac GND z x Transistor: spice model Wire: metal conductance, dielectric constant, etc. Circuit type: LCM | Inverter WLCM, Vs, Vp 23 CAD flow Transfer function: T(f), Tc(f) Channel model RLGC parameters Wire Model 2D RLGC matrices database Wth, Sp 2D field solver target R, C model wire length: l for LCM & Inverter Circuit Model Normalized R(Ohm-um), C(fF/um) switch model database Linearized RC swtich extraction Circuit type: LCM|Inverter, WLCM, Vs, Vp Technology information Wth, Sp Transistor: spice model Wire: metal conductance, dielectric constant, etc. Circuit type: LCM | Inverter WLCM, Vs, Vp 24 CAD flow Data rate density, latency, eye opening, sampling delay(Td) Equalization coefficient: w, y1 Link architecture: FFE, DFE tap numbers Link performance model Transfer function: T(f), Tc(f) Channel model RLGC parameters Wire Model 2D RLGC matrices database Wth, Sp 2D field solver target R, C model wire length: l for LCM & Inverter Circuit Model Normalized R(Ohm-um), C(fF/um) switch model database Linearized RC swtich extraction Circuit type: LCM|Inverter, WLCM, Vs, Vp Technology information Wth, Sp Transistor: spice model Wire: metal conductance, dielectric constant, etc. Circuit type: LCM | Inverter WLCM, Vs, Vp 25 CAD flow Data rate density, latency, eye opening, sampling delay(Td) Energy-per-bit (Eb) Equalization coefficient: w, y1 Link power model Link architecture: FFE, DFE tap numbers Link performance model Transfer function: T(f), Tc(f) target data rate density Channel model RLGC parameters Wire Model 2D RLGC matrices database Wth, Sp 2D field solver target R, C model wire length: l for LCM & Inverter Circuit Model Normalized R(Ohm-um), C(fF/um) switch model database Linearized RC swtich extraction Circuit type: LCM|Inverter, WLCM, Vs, Vp Technology information Wth, Sp Transistor: spice model Wire: metal conductance, dielectric constant, etc. Circuit type: LCM | Inverter WLCM, Vs, Vp 26 Connection to network architecture optimizer energy-per-bit, data rate density, latency CAD flow Target eye constraint: w_c_eye ≥ eye_target Design selection based on metrics Data rate density, latency, eye opening, sampling delay(Td) Equalization coefficient: w, y1 Link power model 2.5 Link architecture: 30mV Eye FFE,Equalized, DFE tap numbers Equalized, 50mV Eye Link 2 Equalized, 90mV Eye performance Repeated model Transfer function: 1.5 T(f), Tc(f) target data rate density Energy/Bit (pJ/Bit) Energy-per-bit (Eb) Channel 1 model RLGC parameters Wire Model 2D RLGC matrices database Wth, Sp 2D field solver target R, C model 0.5 wire length: l for LCM & Inverter Circuit Model 0 Normalized 0 1 2 R(Ohm-um), C(fF/um) Data Rate Density (Gbps/um) switch model database Linearized RC swtich extraction 3 Circuit type: LCM|Inverter, WLCM, Vs, Vp Technology information Wth, Sp Transistor: spice model Wire: metal conductance, dielectric constant, etc. Circuit type: LCM | Inverter WLCM, Vs, Vp 27 Tool verification ~50mV 28   Tool finds optimal parameters Verified against autogenerated spice-netlist Energy/bit (pJ/bit) 0.35 Solid Line: This tool 0.3 Dashed Line: Spice Ebtot 0.25 EbactiveDrv EbPre 0.2 Ebw 0.15 EbscDrv 0.1 0.05 0 0 1 2 3 Data Rate Density (Gb/s/um) 28 Run time comparison : x30,000 LMSE optimal Td Brute Force Equation SPICE Run time (h) 0.74 180 21,460 Normalized x1 x244 x 29,00 Design space trade-off curves are computed over 423,000 designs within 44.4 min circuit design  Run time is improved by ~x30,000 Network design 2.5 Energy/Bit (pJ/Bit) 2 wire, circuit, equalization parameters 3.5 design exploration energy, latency, data rate density Equalized, 30mV Eye Equalized, 50mV Eye Equalized, 90mV Eye Repeated transmit bits Wdrv ,Vs ,V p + - 1.5 D 0.5 + 1 2 Data Rate Density (Gbps/um) 3 2 1.5 1 0 0 + - 1 2.5 + 0.5 D 0 0 Eq., Width Eq., Space Rep., Width Rep., Space 3 Wire Width and Space (um)  Design space 423K points Vth receive bits - 1 2 -h1 (Gbps/um) Throughput Density + width, spacing wT   w1 w2 ... wnFFE  Vth - h1 +h1 29 3 D Link design space exploration: 90nm 2.5 Energy/Bit (pJ/Bit) 2 KimICCAD07 Equalized, 30mV Eye Equalized, 50mV Eye Equalized, 90mV Eye Repeated 1.5 1 0.5 0 0   1 2 Data Rate Density (Gbps/um) 3 Energy vs. data rate density trade-off 2-10x energy improvement 30 Design space exploration: 32nm Kim D&T08 4 2.5 200 110 3.5 120 Label unit : fJ/bit 2 3 Label unit : fJ/bit Rpt. 1.5 Latency (ns) Latency (ns) 220 130 140 150 160 1 170 Eq. 0 9 260 1.5 0.5 22 2 4 Data rate density (Gbps/um) (a) 5mm  240 2 Rpt. 1 0.5 0 2.5 6 300 320 340 23 Eq. 0 0 280 130 1 2 3 4 5 Data rate density (Gbps/um) (b) 10mm 2x less latency improvement with 10x less power 31 6 JoshiHotI09 Terascale NoC topologies, VC buffer size, Channel width… Power Connection to network simulation Rep. . Eq. Power, latency, Offered BW Offered BW Offered bandwidth Interconnect 3 topologies, 2 Wire width & pitch, 1 Transistor size 0 1 Throughput Density 2 3 ( Gbps / um ) Circuit + wire parameters [ 9 ]  2. 5 Eq . , Width Eq . , Space Rep . , Width Rep . , Space Interconnect Vp D Vp D Vp Vs + - Wth + - w2 Vth - y1 Sp 1 D clk + Vs d̂ i + w0 Vs w1 1. 5 1 clk WLCM + - 2 0. 5 Td di Equalized , 30 mV Eye Equalized , 50 mV Eye Equalized , 90 mV Eye Repeated Data rate density (Gb/s/um) Energy/bit (pJ/b) Latency (ps) Energy/B it (pJ/Bi t) Wire Width and Space (um) N oC 0 Vth - -y1 0 0 1 Data Rate Density 2 ( Gbps / um ) 3 Trade - off curve of interconect [ 9 ] Provides proper metrics for the NoC simulators 32 Connection to network simulation Clos with repeated interconnect 25 Power (W) 20 15 10 5 0  Clos with equalized interconnect 0 2 4 6 8 Offered BW (kb/cyc) 0 2 4 6 8 Offered BW (kb/cyc) Power versus offered bandwidth trade-offs of Clos NoCs topologies with repeated and equalized interconnects. 33 Circuit techniques for improved equalized interconnects 0.6um wide, 0.4um spaced CLK T CLK T IT+ ITIb CLK R CLKR Vth VT+ IR- VTIb 1cm VS+ 2.7k VS- VDD DTx Tx Slicer & DFE VDD ‘...00100...' DRx 2.7k Static Pre-distorted coefficients Pre-distort ion 3-tap FFE Tx   TIA 1-tap DFE Rx 3-tap Nonlinear Charge-Injecting Feed Forward Equalizer (FFE) improves driver power efficiency Trans-impedance Amplifier (TIA) before 1-tap DFE improves the bandwidth-power-amplitude trade-off 34 Review: 2-tap FFE driver YFFE di D d 1 YFFE  FFE w0 w0-w1 -w1 + d0 w0+w1 YFFE t w0d0 w1d Data dependent summation/subtraction -w0+w1 1 -w0-w1 Di di 0 -1 0 -1 1 1 0 0 0 -1 -135 -1 Conventional: voltage dividing (VD) driver Vdd w0 D 0 D0 YFFE R/(0.5-αv) R/(0.5+αv) D0 w1 D -1 D0 D -1 R/(0.5+αv) w0+w1 D -1 Zc(ω) Zc(ω) D -1 R/(0.5-αv) w0-w1 Gnd t Vdd R/(0.5+αv) D0 -w0 D0 R/(0.5-αv) D0 D -1 w1 D0 D -1 R/(0.5+αv) D -1 -w0+w1 -w0+w1 Zc(ω) Zc(ω) D -1 R/(0.5-αv) -w0-w1 Di 0 0 1 0 0 0 di -1 -1 1 -1 -1 -1 Gnd  Current wasted in subtraction 36 Conventional: voltage dividing (VD) driver 0.35 Ebtot dd EbactiveDrv 0.3 Ebpre R/(0.5+αEbw v) 0.25 EbscDrv Energy/bit (pJ/bit) V 0.2 w0 D 0 D0 w1 D -1 YFFE R/(0.5-αv) w0+w1 D -1 Zc(ω) 0.15 0.1 D0 D0 D -1 Zc(ω) D -1 0.05 R/(0.5+αv) 0 0.5 w0-w1 R/(0.5-αv) 1 1.5 2 2.5 Data Rate Density (Gb/s/um) 3 Gnd t Vdd R/(0.5+αv) D0 R/(0.5-αv) D0 wD0 0 D -1 w1 D0 D -1 R/(0.5+αv) D -1 -w0+w1 -w0+w1 Zc(ω) Zc(ω) D -1 R/(0.5-αv) -w0-w1 Di 0 0 1 0 0 0 di -1 -1 1 -1 -1 -1 Gnd  Current wasted in subtraction 37 Conventional: current mode logic (CML) driver Vdd w0+w1 R YFFE R Zc(ω) D0 D0 D -1 (0.5+αv)Vdd/R w1 -w0 D0 Zc(ω) D -1 (0.5-αv)Vdd/R Gnd Vdd R R D0 D -1 (0.5+αv)Vdd/R -w0+w1 Zc(ω) Zc(ω) D -1 (0.5-αv)Vdd/R Gnd  w0+w1 Current wasted in subtraction w0-w1 t -w0+w1 -w0-w1 Di 0 0 1 0 0 0 d i -1 -1 1 -1 -1 -1 38 Conventional: current switching (CS) driver Vdd (0.5+αc)Icw w1 w0 D D0 D -1 D0 D0 D -1 0 YFFE (0.5-αc)Icw D -1 Zc(ω) w0+w1 Zc(ω) (0.5+αc)Icw D -1 w0-w1 (0.5-αc)Icw (0.5+αc)Icw Gnd Vdd w1 D0 D0 D -1 -w0 (0.5+αc)Icw t D0 D0 D -1 (0.5-αc)Icw D -1 -w0+w1 -w0+w1 Zc(ω) Zc(ω) D -1 (0.5-αc)Icw -w0-w1 Di 0 0 1 0 0 0 d i -1 -1 1 -1 -1 -1 Gnd  Current wasted in subtraction 39 Proposed: Charge injection (CI) driver Vdd I1 I0 D0 YFFE I1 I0 D0 D0 D-1 D0 D-1 I0+I1 Zc(ω) Zc(ω) D0 D0 D0 D-1 w0+w1 I1 D0 D-1 I1 I0 I0+I1 I0 w0-w1 Gnd -I0 I1 I0 D0 D0 D0 D-1 D0 D-1 Zc(ω) Zc(ω) D0 D0 I0 D0 D-1 -I0 I1 t -I0 Vdd D0 D-1 w0-w1 -I1 -w0-w1 -I0+I1 Di 0 0 1 0 0 0 d i -1 -1 1 -1 -1 -1 Gnd  No current subtraction: no current waste 40 Average supply current (mA) Power comparison 8x 3x 2x 4x 1x Typical off-chip 2x, 2x 1x On-chip R of VD 41 Eye sensitivity Vdd w0 w1 D -1 D0 D0 D0 D0 D -1 D -1 D -1 Vdd/2 RL Vdd/2 Zc(ω) RL Zc(ω) w1 +∆w1 w0 Eye-∆Eye Eye E ye Gnd  The percentage of the eye reduction divided by the percentage of the coefficient perturbation S xi  E ye xi xi xn  0, n  i Eye sensitivity: CS FFE YFFE Vdd w0 -∆w0 w0+w1 w1 Vdd/2 RL Vdd/2 D -1 D0 D0 D -1 Zc(ω) D0 D0 D -1 w0 w1 RL w0-w1 Zc(ω) D -1 -w0+w1 w1 w0 -∆w0 Eye Gnd -w0-w1 -∆w0 w0-w1 -∆w0 w0   Di 1 1 1 1 = w1 A large error due to subtraction of large currents Coefficient errors do not attenuate 43 Eye sensitivity: CI FFE (I0) Vdd I0 I0 D0 I0+I1 I1 D0 D0 D-1 D0 D-1 Vdd/2 RL Vdd/2 Zc(ω) D0 D0 I1 I0 I1 -∆I0 I0 RL Zc(ω) D0 D-1 YFFE -I0 Eye D0 D-1 Gnd -∆I0 I0  = I0 -∆I0 -I0-I1 Di -I1 11 1 1 A small error by small I0 current 44 Eye sensitivity: CI FFE (I1) Vdd I1 I0 D0 D0 D0 D-1 I1 D0 D-1 Vdd/2 RL Vdd/2 Zc(ω) D0 D0 D0 D-1 I1 I0 I0+I1 RL YFFE I1 -∆I1xhatt I0 Zc(ω) D0 D-1 Eye -I0 Gnd -∆I1 ( -I0-I1 + ) I0 -I1 Di 0 0 1 0 0 hatt x (channel = attenuation) I1  Large error but attenuated by the channel 45 Resolution requirement comparison CS, VD, CML CI  Coefficients Resolution Requirement (bits) W0 W1 W2 I0 I1 I2 7-8b 8-9b 6-7b 3-4b 3-4b 4-5b CI FFE   >10 x difference on eye sensitivity 10x less accurate hardware requirement 46 Link design overview 0.6um wide, 0.4um spaced CLK T CLK T IT+ ITIb CLK R CLKR Vth VT+ IR- VTIb 1cm VS+ 2.7k VS- VDD DTx Tx Slicer & DFE VDD ‘...00100...' DRx 2.7k Static Pre-distorted coefficients Pre-distort ion 3-tap FFE Tx   TIA 1-tap DFE Rx 3-tap Nonlinear Charge-Injecting Feed Forward Equalizer (FFE) improves driver power efficiency Trans-impedance Amplifier (TIA) before 1-tap DFE improves the bandwidth-power-amplitude trade-off 47 D0_Odd D0_Even 3-tap CI FFE transmitter Weak Driver 1 D1 L LD D1 P1Decoding D2 Block 8 D0 D1 2 D2 D1 P1+ + + + Strong Driver 20 + P1 ,P2 ,N1 ,N2 ,   P1- ,P2- ,N1- ,N2- I Ib - 8 8 Transition Signals : + CLK CLK V CLK 8 MUX L LD D0 D1 Ib High 0.9 0.8 Middle Middle 0.7 0.6 0.51.5 Current (mA) MUX + - D0 D0 Voltage (V) CLK 2 Low I2 2.5 I1 0.5 0 Time (ns) 3 3.5 1.5 2 2.5 -I0 Time (ns) 3 3.5 -I0 -0.5 A<19:0> Static pre-distorted FFE coefficients -(I0+I1+I2) Strong driver : Large current, I1, I2, I1+I2 Weak driver : Small current, I0 48 3-tap CI FFE transmitter   Strong driver : Large current, I1, I2, I1+I2 Weak driver : Small current, I0 49 3-tap CI FFE transmitter   An array of binary weighted transistor A skewed enable NAND gate 50 3-tap CI FFE transmitter  The decoding only require small overhead 51 Nonliearity and predistortion Tx data Decoding Block with Pre-distorted FFE coefficients    Nonlinear CI FFE driver Inver-like strong drivers are power-area efficient but non-linear Static pre-distortion compensates non-linearity Static pre-distortion is only possible in CI FFE 52 CI FFE operation  0001000 VDD P1+ N1+ D0 D0 + N2 + 20 I0 GND VDD A<19:0> P1 N1 - P2- - 8 I0 20 A<19:0> Middle CLK MUX Latch & Decoding Block GND Weak Driver 8 - + 8 8 Transition Signals : P1+ , P2+ , N1+ , N2+ , P1- , P2- , N1- , N2- Ib Strong Driver 20 Ib 0.9 0.8 Middle Middle 0.7 0.6 0.51.5 N2 - + - High P Voltage (V) + Current (mA) - + 2 Low 2 I2 2.5 I1 0.5 0 3 1.5 2 2.5 -I0 3.5 Time(ns) 3 3.5 -I0 -0.5 -(I0 +I1 +I2 ) A<19:0> Static pre-distorted FFE coefficients 53 CI FFE operation  0001000 VDD P1+ N1+ D0 D0 P2 + N2 + 20 I0 GND VDD A<19:0> P1 N1 - P2- - GND 20 A<19:0> 8 I2 High CLK MUX Latch & Decoding Block 8 - + 8 8 Transition Signals : P1+ , P2+ , N1+ , N2+ , P1- , P2- , N1- , N2- Ib Strong Driver 20 Ib High 0.9 0.8 Middle Middle 0.7 0.6 0.51.5 N2 - + - Weak Driver Voltage (V) + Current (mA) - + Low 2 I2 2.5 I1 0.5 0 3 1.5 2 2.5 -I0 3.5 Time(ns) 3 3.5 -I0 -0.5 -(I0 +I1 +I2 ) A<19:0> Static pre-distorted FFE coefficients 54 CI FFE operation  0001000 VDD P1+ N1+ D0 D0 + N2 + 20 I0 GND VDD A<19:0> P1 N1 - P2- - 8 I0 20 A<19:0> Low CLK MUX Latch & Decoding Block GND Weak Driver 8 - + 8 8 Transition Signals : P1+ , P2+ , N1+ , N2+ , P1- , P2- , N1- , N2- I1 + I2 Ib Strong Driver 20 Ib 0.9 0.8 Middle Middle 0.7 0.6 0.51.5 N2 - + - High P Voltage (V) + Current (mA) - + 2 Low 2 I2 2.5 I1 0.5 0 3 1.5 2 2.5 -I0 3.5 Time(ns) 3 3.5 -I0 -0.5 -(I0 +I1 +I2 ) A<19:0> Static pre-distorted FFE coefficients 55  25 20 Vtia 15 BW : 3dB bandwidth V : received voltage at 2GHz I : received current at 2GHz Istatic : static curent for resisitve termination Vtia : TIA-converted voltage with gain of 2.7k Ohm Istatic 0 0.3 BW I 0.2 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Receiver’s input resistance R (kOhm) TIA achieves high bandwidth, small static power, large amplitude simultaneously  0.4 V 10 5 0.5 50% static current, 150% voltage amplitude for the same bandwidth 56 3dB bandwidth BW (GHz) V(mV), Vtia (mV), I(µA), Istatic (10µA) Trans-impedance amplifier receiver Chip fabrication: 90nm CMOS   Tx : 16x70um, Rx : 16x40um 10mm wire on M8 layer over M7 under M9 57 Chip fabrication: 90nm CMOS   Tx : 16x70um, Rx : 16x40um 10mm wire on M8 layer over M7 under M9 58 in-situ measurement support 59 Channel measurement  High-loss channel    25dB @ 690MHz, 40dB @ 2GHz, 46dB @ 3GHz 50% delay and 90% settling time of step response: 1.4ns and 5.5ns i.e. 8.6 and 33 UI at 6Gb/s Losses in most other literatures on Eq. are up to 30dB 60 Eye measurement (a) DFE 3-tap FFE 6Gb/s (b) No DFE, 3-tap FFE 4Gb/s (c) No DFE 2-tap FFE 2Gb/s    Vertical eye height is set about 100mV for trade-off curve extraction First on-chip measurement Demonstrated good eye quality (>50%, max 80%) 61 Sensitivity measurement (a) 6Gb/s   (b) 4Gb/s (c) 2Gb/s Sensitivity < 3 (10x better) : good match with the analysis Low resolution and cheap (power & area) hardware (inverterlike) can equalize a high-loss channel (>40dB) 62 Power Performance Trade-offs 0.70 0.635pJ/b Energy/bit (pJ/bit) 0.60 0.50 0.466pJ/b 0.454pJ/b 0.371pJ/b 0.40 Etc Rx TIA TxOther TxStr 0.30 0.20 0.10 0.00 2Gbps 1 1.1V 2tap FFE No DFE   4Gbps 4Gbps 2 3 1V 1.1V 3tap FFE 3tap FFE No DFE No DFE 6Gbps 4 1.2V 3tap FFE DFE As data rate increases  DC bias (TIA) energy decreases  Channel compensation energy (TxStr) increases Optimal power efficiency: balancing between static (TIA) and 63 active (channel, digital) energy Comparison to relevant works 90nm technology optimized repeater 2.50 0.70 Energy per bit (pJ/bit) This work 2.00 0.60 [7] [10]* 1.50 0.50 [8] 0.40 1.00 0.30 Eq./RF links 0.20 0.50 0.10 0.00 0.00 0.00 1.00 2.00 3.00 4.00 0.00 1.00 2.00 3.00 4.00 data rate density (Gb/s/um)   2x and 3x performance improvement for ~ additional 30% and 120% energy The only measurement >2Gb/s/um-3Gb/s/um  [7] Mensink ISSCC2007, [10] Tam VLSI2009* (5mm), [8] Kim ICCAD2007 64 Conclusion of thesis  Modeling and CAD tool      The first analytical link model for an RC-dominant link Fast CAD tool improves x30,000 runtime Equalized interconnects provides 2x-10x improved power efficiency than repeaters First demonstration of Eq. CAD + NoC CAD simulation Circuit – CI FFE Tx and TIA Rx       CI FFE is introduced for the first time CI FFE saves power by 2x than VD and CS FFE and 4x than CML CI FFE allows simple static pre-distortion to utilize a cheap coarse inverter-like drivers. Demonstrated for the first time CI FFE is 10x more robust to coefficient error than CS FFE TIA-termination saves 50% static current and achieves 150% amplitude for the same bandwidth than a resistive termination Area-energy efficient link operates over 40dB loss! 65 Publication & talk list  Conference        Journal     Byungsub Kim,Soumyajit Mandal, and Rahul Sarpeshkar, "Power-Adaptive Operational Amplifier with PositiveFeedback Self Biasing," IEEE ISCAS 2006. Byungsub Kim and Vladimir Stojanovic, "Equalized Intebrconnects for On-Chip Networks: Modeling and Optimization Framework," IEEE ICCAD 2007. Byungsub Kim and Vladimir Stojanovic, "A 4Gb/s/ch 356fJ/b 10mm Equalized On-chip Interconnect with Nonlinear Charge-Injecting Transmitter Filter and Transimpedance Receiver in 90nm CMOS Technology," IEEE ISSCC, Feb. 2009. Sanquan Song, Byungsub Kim, and Vladimir Stojanovic, "A Fractionally Spaced Linear Received Equalizer with Voltage-to-Time Conversion," IEEE Sympoisum on VLSI Circuits Dig. Tech. Papers, June 2009. Ajay Joshi, Byungsub Kim, and Vladimir Stojanovic, "Designing Energy-efficient Low-diameter On-chip Networkswith Equalized Interconnects," in Proc. of the 17th IEEE Symposium on High-Performance Interconnects, Aug. 2009. Yong Liu, Byungsub Kim, Timothy O. Dickson, John F. Bulzacchelli, and Daniel Friedman, "A 10Gb/s Compact Low-Power Serial I/O with DFE-IIR Equalization in 65nm CMOS," IEEE ISSCC, Feb. 2009. Byungsub Kim and Vladimir Stojanovic, "Characterization of Equalized and Repeated Interconnects for NoC Applications," IEEE Design and Test of Computers, Oct. 2008. Byungsub Kim, Yong Liu, Timothy O. Dickson, John F. Bulzacchelli and Daniel J. Friedman, “A 10-Gb/s Compact Low-Power Serial I/O With DFE-IIR Equalization in 65nm CMOS," IEEE Journal o f Solid-State Circuits, Dec. 2009. Byungsub Kim and Vladimir Stojanovic, “An Energy-efficient Equalized Transceiver for RC-dominant channels, " JSSC 2010 (submitted). Workshop  Byungsub Kim and Vladimir Stojanovic, "Energy Efficient Wireline Communication Over RC-dominant Channels," CMOS Emerging Technologies, Vancouver, Sep. 2009. 66 Acknowledgement Adviser: Vladimir Stojanović Committee: Anathan Chandrakasan, Jowel Dawson Funding: Intel, IBM, NEC Fabrication: TAPO, IBM Kansas City Plant Industry: Ian Young, Alexandra Kern, Sam Palermo, John Bulzacchelli, Dan Friedman, Yong Liu, Timothy Dickson, Troy Beukema Technical help: Fred Chen, Ranko Sredojević, Sanquan Song, Ajay Joshi, Jose Bohorquez, Matt Park, Min Park, Sungwon Chung Entertaining help – ISG-coffee mates: Ranko Sredojević, Ajay Joshi, Michael Georgas, Maxine Lee Old CSG-CAG buddies: Alfred Ng (Coolman), Jaewook Lee, Vinson Lee (Real Crazyman), Jason Kim (Coolguy) 67

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download talk