Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ISSCC 2007 / SESSION 22 / DIGITAL CIRCUIT INNOVATIONS / 22.9 22.9 A 0.28pJ/b 2Gb/s/ch Transceiver in 9Onm CMOS for 10mm On-Chip interconnects The schematic of the receiver implementation is shown in Fig. 22.9.3. The left of the diagram shows a clocked comparator, a sense-amplifier-based flip-flop (SAFF), which consists of a differEisse Mensink. Schinkel EricKlumperinkEdvanTuiential input stage, cross-coupled inverters and an SR-latch. The Fisse Mensink, Daniel Schinkel, Eric Kiumperink, Ed van Tuiji, outputs of the SR-latch drive the low-pass feedback filter, in this Brain N\lauta University of Twente, Enschede, The Netherlands The bandwidth of global on-chip interconnects in modern CMOS processes is limited by their high resistance and capacitance [1]. Repeaters that are used to speed up these interconnects consume a considerable amount of power [2] and area. Recently published techniques [1-4] increase the achievable data rate at the cost of high static power consumption, leading to relatively high energy per bit for low data activity. On the other hand, low-swing schemes [5] often sacrifice bandwidth for power reduction, or make use of an extra low-voltage power supply. More ideally, a transceiver would combine low dynamic and static power with a high achievable data rate. The bandwidth and power consumption of an RC-limited interconnect depends on its source (Zs) and load impedances (ZL). In Fig. 22.9.1, a conventional case with an inverter used as both a transmitter (Zs = 100Q) and a receiver (ZL = 10fF) has only 62MRz bandwidth and high power consumption. Current-seasing schemes (ZL = 190Q in Fig. 22.9.1) increase the bandwidth up to 3x [1,4], but with increased power at low data activities. We propose using a capacitive transmitter (Zs = 255fF in Fig. 22.9.1), which has the same bandwidth improvement as current sensing, but with lower power and without static power consumption. This paper presents a transceiver for 10mm long inte9connects in a 1.2V 90nm 6M CMAOS process, shownt in Fig. 22.9.2. A capacitive pre-emphasis transmitter both increases the bandwidth and decreases the voltage swing, without the need for an additional power supply. As low-swing signaling is more susceptible to crosstalk, we use differential interconnects with twists [1], of which only a single-ended half is shown. In contrast to the wide interconnects used in [2,3], we use relatively small width (0.54,um) and spacing (0.32gm) [1,4] and assume high metal-density surroundings. The receiver uses decision feedback equalization (DFE) [6] to further increase the achievable data rate. The DFE, with a continuous-time feedback filter, consumes almost no extra power. The bandwidth-increasing pre-emphasis effect of the transmitter is shown at the bottom right of Fig. 22.9.2: every transition is emphasized by the transmitter by injecting a charge via capacitance Cs. With only a series capacitor (AC-coupling), the DC voltage on the interconnect is not well defined as there is no DC path to one of the supplies. To control the DC voltage, a load resistor RL and a transconductance G,,, controlled by Vi., are added (see Fig. 22.9.2). By having the time constants C /G,, and RLCwi,, equal, the transfer function resembles the transfer function of the capacitive transmitter in Fig. 22.9.1. If a small G,,, (5gS) and a large RL (16kQ) are chosen, the static current is kept small (6gA) and also the power consumption remains similar. Gmand RL are implemented with MOS transistors as visible in the bottom part of Fig. 22.9.2. Folr Cs, the gate capacitanlce of an NMOS transistor is used. As the gate oxide is much thinner than the oxide between interconnects, the area consumed by C, is relatively small (6x6 gM2). The signals, with a voltage swing of 100mV, are chosen close to V, of 1.2V, because the capacitance of the NMOS transistor is highest for a high gate-source voltage. The total area of the differential transmitter is 226 gM2 *207IE I n1111 111ternaiona The chip micrograph is shown in Fig. 22.9.7. The 10mm-long interconnects, placed in metal 4, have a total distributed resistance of 2kQ and a capacitance of 2.8pF. The other metal layers are filled with GND- and V,,-connected metal stripes. An external pattern generator/analyzer generates data and measures BER. The receiver clock is generated externally to adapt its phase to the eye position and to be able to measure eye widths. In an application, a simple skew circuit or a source-synchronous approach could be used to generate the proper clock phase. Eyediaglrams alre measured via 50Q output buffercs that alre connected to the output of a differenetial intercontnaect. Figure 22.9.4 shows a measured eye diagram at a data rate of lGb/s. The measured BER at the edges of the eye is also shown. The BER drops rapidly below a clock skew of -150ps and above 180ps, giving an eye-opening of 670ps. Data rates up to 1.35Gb/s are achieved without DFE (IEQ=O). The one-6 offset of the total transceiver is llmV, measured over 20 samples. Due to this offset, not all samples achieve 1.35Gb/s, but all samples do achieve a slightly lower data rate of lGb/s. Simulations over process corners also indicate that the circuit is robust to PVT variations at a slightly lower than the maximum achievable data rate. Data Trate rates up to 2Gb/s are measured with DFE. Fig. 22.9.5 shows that DFE improves the eye opening for a wide range of IER. InL an appliDFE can fora be fixed therefore be at design time. cation, 'EQ can In Fig. 22.9.6, the measured energy per bit is plotted as a function of transition probability at different data rates. With random data at 2Gb/s, only 0.28pJ/b is dissipated, which is 7x lower than earlier work [1,4]. The power dissipation of 0.12pJ/b at zero data activity is mainly due to the power dissipation in the SAFF, which has large transistors to get a low offset (6s = 8mV). Clock-gating can be used to eliminate power consumption during inactive penods. The DFE part of the circuit requires less than 7% of the total transceiver power, while it can increase the achievable data rate 1.5x. thereye design Ie. With the presented transceiver, the same high data rates over small RC bandwidth limited on-chip interconnects are possible as with plrevious solutions, but with a 7x lower power consumption. By using both a capacitive pre-emphasis transmitter and continuous-time DFE, a data rate of 2Gb/s is achieved over a 10mm long interconnect. The transceiver consumes only 0.28pJ/b. Acknowledgements: We thank Philips Research for chip fabrication, the Dutch Technology Foundation (STW, project TCS.5791) for funding and Gerard Wienk for assistance. [11 D. Schinkel, E. Mensink, E. A. M. Klumperink, et al., "A 3-Gb/s/ch Transceiver for 10-mm Uninterrupted RC-limited Global On-Chip Interconnects," IEEE J. Solid-State Circuits, vol. 41, no. 1, pp. 297-306, Jan., 2006. [21 A. P. Jose, G. Patounakis, and K. L. Shepard, "Pulsed Current-Mode Signaling for Nearly Speed-of-Light Intrachip Communication," IEEE J. Circuits, vol. 41, no. 4, pp. 772-780, Apr., 2006. .Solid-State [31 A. P. Jose, and K. L. Shepard, "Distributed Loss Compensation for Low-Latency On-Chip Interconnects," ISSCC Dig. Tech. Papers, pp. 516A The receiver concept is also shown inn Fig. 22.9.2. clocked colmparator restores the low-swing line output to full swing. DFE further ilncreases the achievable data rate. Instead of the ofteln-used FIR filters [6], a continuous-timne filter operates as the decision fedbc fi1ltr Thi flter cacl mos of the ISI with a 1iml anpoe-fiin an( owerettelentfis-re lst-of e imlmnain lmpemetatln, whra nLeeasann FI q- 4 f1iter requires many taps. 414 case an RC filter, implemented with pass-gates and anti-parallel gate capacitances. The filter output is coupled back into the SAFF via a second differential input stage, as shown on the right of Fig. 22.9.3. IEQ sets the feedback gain A (see Fig. 22.9.2). The total area of the receiver is 117 gm2 (324m2 for the DFE part) 517, Feh., 2006. [41 L. Zhang, J. Witson, R. Bashiruttah, et at,, "Driver Pro-Emnphasis Techniques for On-Chip Glohal Buses," ISLPED, pp. 186-191, Aug., 2005. [51 H.: Zhang, V. George, and J. M. Rahaey, "Low-Swing On-Chip Signaling Techniques: Effectiveness and Rohustness," IEEE Thanrs. VLSI Systems, vol. 5, pp.A. 264-272, Jun., 2000.et at,, "Adaptive Equatization and [61 V. Stojanovic, HIo, B. Gartlepp, Data Recovery in a Duat-Mode (PAM2/4) Seriat Link Transceiver," Symp. VLSI Circuits, pp. 348-351, Jun., 2004. Solid-State Circuits Conference 1-4244-0852-0/07/$25.00 ©2001 IEEE. ISSCC 2007 1 February 14, 2007 /12:00 PM 3 Conventional: 1000 VL $80 62MvHz -120 b10s 1bOan 10 OlfFT Q vs V Current-sensing: 05 1 transition probability > ff|20H ti |cpctvpeepai troncalclcecmaar wit BW VST> Gm*Vi n _W r 120 0Mzcapacitive ~ ~ ~ ~ ~ ~pre-emphasis ~transmit er w VoQ '4 |~~~~1 Gbps 1 0.5 190flo,~1 22MZ\{n1 10, I0 0 transition probability _ 8~ ~ ~ ~-01 frequency (Hz) -120 0 Capacitive transmitter: -8O10 220MHz 10 10 1 OfFT -120 6 a LKK L4 0.5 10 clocked comparator with continuous-timeg feedback filter interconnect and Rl CO biasing 5|o9 circuit implementation: V 1Gbps 2V AU40 255fF BW= 1bp , Capacitive transmitter: I . ' Iou 0 frequency (Hz) VL 1000 Ja lGbps ^ .3 2R S 40 VV[.jjv 1 VL DD1.4V VLR1\ transition probabilityIVL2J frequency (Hz) .k tm Figure 22.91: Bandwidth and energy per bit versus transition probability (= data activity) for three different termination schemes. The results are for 10mm differential inter- Figure 22.9.2: Concept of transceiver and circuit implementation of the capacitive preconnects with a distributed resistance of 2kil and a distributed capacitance of 2.8pF. emphasis transmitter. | ' t RC = ~~~L W _ _1 °i|_n 1ns /O\ 1n =~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~11 .. .2 Seedbackfitense ampliteer IVfb+ Wf3-i atheoedgesuoftte0In RC~~~~ Eye-opening Measurements .~~1L, HF;' r/ Vfb- >_t 1 |Bit Error Rate Measurements 1-________IEQ ° w _ U- Vini1 'in- a H , 11 0 50 150 I150 100 -50 .# .X .} ..25GI}I5 ....clock delay (ps) Figure 22.9.3: Implementation of the clocked comparator with continuous-time Figure 22.9A4 Eye-diagram at the input of the receiver at 1Gb/s and measured Bit Error feedback filter. Rate at the edges of the eye. Power Consumption Measurements Eye-opening Measurements 600~~~~~~~~~~~~~~~~~05 500 t ----s--------------r ... 0.25Gb/s, I'EQ0=- 0.45 --e-0.50Gb/s, E0 =0 rte-=1.OQGbI:i/° /-| > data 0.40 -EQ .- -=.-1.25Gb- s E.0 0I35 ;/Sdt... 1.5Gb/s, IEQ75gA O --El-- -r=.1.75Gb/s, I 'EEQS11°2 5200r =1 .''.-#t---. g 2.00G/ EQ 0-,~~~~~~~~~~~~~~~~~~~~~~~~0~ 20.25 (U 1 20 40 ' -e-data 60 rate =BO1.25 Gb/s100 -e- at at 13 G/ 0.1 -2-data rate = 1.50 Gb/s Figure 22.9.5: Measured eye-opening for different data rates as a function of lEo. ° 0.1 0.2 E O.3 0.5~~0 0.4 0 transition probability (=data activlity). Conztinued onz Patge 612 DIGEST OFTECHNICAL PAPERS * 415 ISSCC 2007 PAPER CONTINUATIONS 1mm Figure 22.9.7: Chip micrograph. 612 I 200 IEEE Inenainl SlidStat CIrut Cofrec 1-44-82O0/20 20 IEEE11