Download can thereye be fora design Ie.

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
ISSCC 2007 / SESSION 22 / DIGITAL CIRCUIT INNOVATIONS / 22.9
22.9
A 0.28pJ/b 2Gb/s/ch Transceiver in 9Onm CMOS
for 10mm On-Chip interconnects
The schematic of the receiver implementation is shown in Fig.
22.9.3. The left of the diagram shows a clocked comparator, a
sense-amplifier-based flip-flop (SAFF), which consists of a differEisse Mensink.
Schinkel EricKlumperinkEdvanTuiential input stage, cross-coupled inverters and an SR-latch. The
Fisse Mensink, Daniel Schinkel, Eric Kiumperink, Ed van Tuiji,
outputs of the SR-latch drive the low-pass feedback filter, in this
Brain N\lauta
University of Twente, Enschede, The Netherlands
The bandwidth of global on-chip interconnects in modern CMOS
processes is limited by their high resistance and capacitance [1].
Repeaters that are used to speed up these interconnects consume
a considerable amount of power [2] and area. Recently published
techniques [1-4] increase the achievable data rate at the cost of
high static power consumption, leading to relatively high energy
per bit for low data activity. On the other hand, low-swing
schemes [5] often sacrifice bandwidth for power reduction, or
make use of an extra low-voltage power supply. More ideally, a
transceiver would combine low dynamic and static power with a
high achievable data rate.
The bandwidth and power consumption of an RC-limited interconnect depends on its source (Zs) and load impedances (ZL). In
Fig. 22.9.1, a conventional case with an inverter used as both a
transmitter (Zs = 100Q) and a receiver (ZL = 10fF) has only
62MRz bandwidth and high power consumption. Current-seasing schemes (ZL = 190Q in Fig. 22.9.1) increase the bandwidth up
to 3x [1,4], but with increased power at low data activities. We
propose using a capacitive transmitter (Zs = 255fF in Fig.
22.9.1), which has the same bandwidth improvement as current
sensing, but with lower power and without static power consumption.
This paper presents a transceiver for 10mm long inte9connects in
a 1.2V 90nm 6M CMAOS process, shownt in Fig. 22.9.2. A capacitive pre-emphasis transmitter both increases the bandwidth and
decreases the voltage swing, without the need for an additional
power supply. As low-swing signaling is more susceptible to
crosstalk, we use differential interconnects with twists [1], of
which only a single-ended half is shown. In contrast to the wide
interconnects used in [2,3], we use relatively small width
(0.54,um) and spacing (0.32gm) [1,4] and assume high metal-density surroundings. The receiver uses decision feedback equalization (DFE) [6] to further increase the achievable data rate. The
DFE, with a continuous-time feedback filter, consumes almost no
extra power. The bandwidth-increasing pre-emphasis effect of the
transmitter is shown at the bottom right of Fig. 22.9.2: every
transition is emphasized by the transmitter by injecting a charge
via capacitance Cs.
With only a series capacitor (AC-coupling), the DC voltage on the
interconnect is not well defined as there is no DC path to one of
the supplies. To control the DC voltage, a load resistor RL and a
transconductance G,,, controlled by Vi., are added (see Fig. 22.9.2).
By having the time constants C /G,, and RLCwi,, equal, the transfer function resembles the transfer function of the capacitive
transmitter in Fig. 22.9.1. If a small G,,, (5gS) and a large RL
(16kQ) are chosen, the static current is kept small (6gA) and also
the power consumption remains similar. Gmand RL are implemented with MOS transistors as visible in the bottom part of Fig.
22.9.2. Folr Cs, the gate capacitanlce of an NMOS transistor is
used. As the gate oxide is much thinner than the oxide between
interconnects, the area consumed by C, is relatively small
(6x6 gM2). The signals, with a voltage swing of 100mV, are chosen
close to V, of 1.2V, because the capacitance of the NMOS transistor is highest for a high gate-source voltage. The total area of the
differential transmitter is
226
gM2
*207IE
I
n1111
111ternaiona
The chip micrograph is shown in Fig. 22.9.7. The 10mm-long
interconnects, placed in metal 4, have a total distributed resistance of 2kQ and a capacitance of 2.8pF. The other metal layers
are filled with GND- and V,,-connected metal stripes. An external pattern generator/analyzer generates data and measures
BER. The receiver clock is generated externally to adapt its
phase to the eye position and to be able to measure eye widths. In
an application, a simple skew circuit or a source-synchronous
approach could be used to generate the proper clock phase. Eyediaglrams alre measured via 50Q output buffercs that alre connected to the output of a differenetial
intercontnaect.
Figure 22.9.4 shows a measured eye diagram at a data rate of
lGb/s. The measured BER at the edges of the eye is also shown.
The BER drops rapidly below a clock skew of -150ps and above
180ps, giving an eye-opening of 670ps. Data rates up to 1.35Gb/s
are achieved without DFE (IEQ=O). The one-6 offset of the total
transceiver is llmV, measured over 20 samples. Due to this offset, not all samples achieve 1.35Gb/s, but all samples do achieve
a slightly lower data rate of lGb/s. Simulations over process corners also indicate that the circuit is robust to PVT variations at a
slightly lower than the maximum achievable data rate. Data
Trate
rates up to 2Gb/s are measured with DFE. Fig. 22.9.5 shows that
DFE improves the eye opening for a wide range of IER. InL an appliDFE can
fora
be fixed
therefore be
at design time.
cation, 'EQ can
In Fig. 22.9.6, the measured energy per bit is plotted as a function of transition probability at different data rates. With random
data at 2Gb/s, only 0.28pJ/b is dissipated, which is 7x lower than
earlier work [1,4]. The power dissipation of 0.12pJ/b at zero data
activity is mainly due to the power dissipation in the SAFF, which
has large transistors to get a low offset (6s = 8mV). Clock-gating
can be used to eliminate power consumption during inactive penods. The DFE part of the circuit requires less than 7% of the total
transceiver power, while it can increase the achievable data rate
1.5x.
thereye
design Ie.
With the presented transceiver, the same high data rates over
small RC bandwidth limited on-chip interconnects are possible as
with plrevious solutions, but with a 7x lower power consumption.
By using both a capacitive pre-emphasis transmitter and continuous-time DFE, a data rate of 2Gb/s is achieved over a 10mm
long interconnect. The transceiver consumes only 0.28pJ/b.
Acknowledgements:
We thank
Philips Research for chip fabrication, the Dutch Technology
Foundation (STW, project TCS.5791) for funding and Gerard Wienk for
assistance.
[11 D. Schinkel, E. Mensink, E. A. M. Klumperink, et al., "A 3-Gb/s/ch
Transceiver for 10-mm Uninterrupted RC-limited Global On-Chip
Interconnects," IEEE J. Solid-State Circuits, vol. 41, no. 1, pp. 297-306,
Jan., 2006.
[21 A. P. Jose, G. Patounakis, and K. L. Shepard, "Pulsed Current-Mode
Signaling for Nearly Speed-of-Light Intrachip Communication," IEEE J.
Circuits, vol. 41, no. 4, pp. 772-780, Apr., 2006.
.Solid-State
[31 A. P. Jose, and K. L. Shepard, "Distributed Loss Compensation for
Low-Latency On-Chip Interconnects," ISSCC Dig. Tech. Papers, pp. 516A
The receiver concept is also shown inn Fig. 22.9.2. clocked colmparator restores the low-swing line output to full swing. DFE further ilncreases the achievable data rate. Instead of the ofteln-used
FIR filters [6], a continuous-timne filter operates as the decision
fedbc fi1ltr Thi flter cacl mos of the ISI with a 1iml
anpoe-fiin
an( owerettelentfis-re
lst-of e imlmnain
lmpemetatln, whra
nLeeasann FI
q- 4
f1iter requires many taps.
414
case an RC filter, implemented with pass-gates and anti-parallel
gate capacitances. The filter output is coupled back into the
SAFF via a second differential input stage, as shown on the right
of Fig. 22.9.3. IEQ sets the feedback gain A (see Fig. 22.9.2). The
total area of the receiver is 117 gm2 (324m2 for the DFE part)
517, Feh., 2006.
[41 L. Zhang, J. Witson, R. Bashiruttah, et at,, "Driver Pro-Emnphasis
Techniques for On-Chip Glohal Buses," ISLPED, pp. 186-191, Aug., 2005.
[51 H.: Zhang, V. George, and J. M. Rahaey, "Low-Swing On-Chip
Signaling Techniques: Effectiveness and Rohustness," IEEE Thanrs. VLSI
Systems,
vol. 5, pp.A. 264-272,
Jun., 2000.et at,, "Adaptive Equatization and
[61 V. Stojanovic,
HIo, B. Gartlepp,
Data Recovery in a Duat-Mode (PAM2/4) Seriat Link Transceiver," Symp.
VLSI Circuits, pp. 348-351, Jun., 2004.
Solid-State Circuits Conference
1-4244-0852-0/07/$25.00 ©2001 IEEE.
ISSCC 2007 1 February 14, 2007 /12:00 PM
3
Conventional:
1000
VL
$80 62MvHz
-120
b10s
1bOan
10
OlfFT Q
vs
V
Current-sensing:
05
1
transition probability
> ff|20H ti |cpctvpeepai troncalclcecmaar wit
BW
VST>
Gm*Vi
n
_W
r
120 0Mzcapacitive
~ ~ ~ ~ ~ ~pre-emphasis
~transmit er
w
VoQ
'4
|~~~~1
Gbps
1
0.5
190flo,~1 22MZ\{n1
10, I0
0
transition probability
_ 8~ ~ ~ ~-01 frequency
(Hz)
-120
0
Capacitive transmitter:
-8O10 220MHz
10 10
1 OfFT
-120
6
a
LKK
L4
0.5
10
clocked comparator with
continuous-timeg
feedback filter
interconnect and
Rl
CO
biasing
5|o9
circuit implementation:
V
1Gbps
2V
AU40
255fF
BW=
1bp
,
Capacitive transmitter:
I
.
'
Iou
0
frequency (Hz)
VL
1000
Ja
lGbps
^
.3
2R
S 40
VV[.jjv
1
VL
DD1.4V
VLR1\
transition probabilityIVL2J
frequency (Hz)
.k
tm
Figure 22.91: Bandwidth and energy per bit versus transition probability (= data activity) for three different termination schemes. The results are for 10mm differential inter- Figure 22.9.2: Concept of transceiver and circuit implementation of the capacitive preconnects with a distributed resistance of 2kil and a distributed capacitance of 2.8pF. emphasis transmitter.
| ' t RC = ~~~L
W
_
_1
°i|_n
1ns
/O\
1n
=~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~11
..
.2
Seedbackfitense ampliteer
IVfb+
Wf3-i
atheoedgesuoftte0In
RC~~~~
Eye-opening Measurements
.~~1L,
HF;'
r/
Vfb-
>_t
1
|Bit Error Rate Measurements
1-________IEQ
°
w
_
U-
Vini1
'in-
a
H
,
11
0
50
150
I150 100 -50
.# .X
.}
..25GI}I5
....clock
delay (ps)
Figure 22.9.3: Implementation of the clocked comparator with continuous-time Figure 22.9A4 Eye-diagram at the input of the receiver at 1Gb/s and measured Bit Error
feedback filter.
Rate at the edges of the eye.
Power Consumption Measurements
Eye-opening Measurements
600~~~~~~~~~~~~~~~~~05
500
t ----s--------------r
... 0.25Gb/s, I'EQ0=-
0.45 --e-0.50Gb/s, E0
=0
rte-=1.OQGbI:i/°
/-| >
data
0.40 -EQ
.-
-=.-1.25Gb- s E.0
0I35 ;/Sdt... 1.5Gb/s, IEQ75gA
O
--El-- -r=.1.75Gb/s,
I
'EEQS11°2
5200r
=1 .''.-#t---.
g
2.00G/ EQ
0-,~~~~~~~~~~~~~~~~~~~~~~~~0~
20.25
(U
1
20 40 ' -e-data
60
rate =BO1.25 Gb/s100
-e- at at 13 G/
0.1
-2-data rate = 1.50 Gb/s
Figure 22.9.5: Measured eye-opening for different data rates as a function of lEo.
°
0.1
0.2
E
O.3 0.5~~0
0.4
0
transition probability (=data activlity).
Conztinued onz Patge 612
DIGEST OFTECHNICAL PAPERS *
415
ISSCC 2007 PAPER CONTINUATIONS
1mm
Figure 22.9.7: Chip micrograph.
612
I 200 IEEE Inenainl SlidStat CIrut Cofrec
1-44-82O0/20
20 IEEE11