Download paper

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Three-phase electric power wikipedia , lookup

Islanding wikipedia , lookup

Current source wikipedia , lookup

Rectifier wikipedia , lookup

Resistive opto-isolator wikipedia , lookup

Mains electricity wikipedia , lookup

Two-port network wikipedia , lookup

Alternating current wikipedia , lookup

Pulse-width modulation wikipedia , lookup

Integrating ADC wikipedia , lookup

Metadyne wikipedia , lookup

Atomic clock wikipedia , lookup

Buck converter wikipedia , lookup

Heterodyne wikipedia , lookup

Switched-mode power supply wikipedia , lookup

Immunity-aware programming wikipedia , lookup

Regenerative circuit wikipedia , lookup

Flip-flop (electronics) wikipedia , lookup

Opto-isolator wikipedia , lookup

Phase-locked loop wikipedia , lookup

Time-to-digital converter wikipedia , lookup

Transcript
IEEE Asian Solid-State Circuits Conference
November 14-16, 2011 / Jeju, Korea
Injection-Locked Clock Receiver for Monolithic
978-1-4577-1785-7/11/$26.00 ©2011 IEEE
Optical Link in 45nm SOI
Jonathan Leu and Vladimir Stojanović
Department of Electrical Engineering and Computer Science
Massachusetts Institute of Technology
Cambridge, MA 02139
Email: cyleu at mit.edu
Abstract—This paper presents a compact, low-power injectionlocked clock receiver for large-scale integrated optical interconnects. The injection-locked design provides high input sensitivity
when compared to the TIA-less integrating clock receiver design.
It also breaks the gain/bandwidth tradeoff of a traditional
transimpedance amplifier (TIA) receiver, with a self-adjusting
mechanism to increase robustness. The receiver is designed in
45nm SOI and shown to operate from 1.0 GHz to 3.0 GHz while
consuming from 186 μW to 444 μW, with an input sensitivity
from 8.6 μA to 33.2 μA and output jitter σ within 1 %UI.
I. I NTRODUCTION
IEEE
The emergence of multi-core processors creates increasing
demand for on-chip and off-chip data bandwidth, which copper
interconnects cannot meet due to chip power delivery and
packaging (cooling and pin-density) constraints. To address
this problem, employing wavelength-division multiplexing
(WDM) on monolithically integrated silicon photonic waveguides can greatly improve energy-efficiency and bandwidth
density [1]. Dense WDM leverages the link parallelism to improve energy-efficiency, keeping the per wavelength data-rate
in the 1-5Gbps regime, while still enabling high throughput
per waveguide(100-500Gbps).
By using one wavelength in an on-chip or chip-to-chip
WDM link, the source clock can be transmitted along with
the data, as shown in Figure 1. Since clock-transmitter and
data-transmitter share the same clock grid, relative jitter
between the sent clock and data is minimal. Also, optical
clock signaling does not suffer from power rail injected noise,
electromagnetic interference or crosstalk, so no jitter is added
in the channel. At the clock receiver, the signal is regenerated,
buffered, and used to clock the other data receivers. The
low latency across wavelengths of optical waveguides means
that all data receivers can operate on the same clock phase,
removing the need for for phase-locked loops (PLLs) or delaylocked loops (DLLs).
High sensitivity and low power sense-amps can be used for
receiving data, so laser power and circuit power can be saved.
However for clock receivers, previous options are either lowsensitivity integrating designs [2], which require high laser
power, or using a traditional TIA and spending circuit power to
gain sensitivity. This paper breaks the constraints of previous
approaches, by utilizing a combination of injection-locking
and integrate-reset
to realize high-sensitivity and
Asian
Solid-Statetopologies
Circuits Conference
low-jitter
clock receiver.
November
14-16, 2011 / Jeju, Korea
978-1-4577-1785-7/11/$26.00 ©2011 IEEE
Fig. 1. A point-to-point optical WDM source-forwarded link. Transmit row
is on the left. The clock uses one channel λ1 , and is received in sync with
all other channels on the left.
II. C LOCK R ECEIVER C IRCUIT
The proposed circuit, Figure 2, overcomes the transimpedance amplifier TIA gain/bandwidth tradeoff [3][4] with
an integrate and reset topology. The input current is integrated
onto the input parasitic capacitance, equivalent to a high-gain
TIA; yet, it can also be strongly reset, equivalent to a highbandwidth TIA. This is highly beneficial as the relatively large
photodiode (estimated to be 10fF) and parasitic interconnect
(estimated to be 5fF) capacitance Cpd causes the input node
to be the dominant pole for an optical receiver, and any
improvement here directly impacts the overall performance.
Furthermore, the magnitude of the input current Ipd divided
by the input capacitance Cpd determines the transient slope
of V in. For a given input-referred noise in voltage σV , the
corresponding jitter σT is σV divided by the slope of V in. To
minimize output jitter, we wish to maximize the slope of V in,
so a maximum input impeadance is desired. For optical operation, upon receiving a laser pulse, the photodiode provides
an input current pulse. The strength of a backup PD-emulator
and the bias voltages biasp and biasn of the differential pair
stages shown in Figure 2, are digitally programmable.
As shown in Figure 3, the PD or the PD-emulator generates a current pulse Ipd, representative of a clock source
generated by a transmitting optical modulator. Ipd is drawn
a1)
M=1 M=2 M=4
M=1
in
Vin
PD[0 PD[1]
PD[0]
PD[1 PD[2]
PD[2
inb
b
biasp
autop
biasp
d)
in
DIFF1
DIFF2
biasn
autop
DIFF3
auto
inb
biasn
auton
auton
Source
in
inb
Source
PD[0 PD[1]
PD[0]
PD[1 PD[2]
PD[2
Cauto
b)
a2)
in
reset
c))
biasp
Cpd
M=1 M=2 M=4
Differential to
single-end
RxClk clock
biasn
inb
Fig. 2. Schematic of the clock receiver. Photodiode emulator circuit simulates photocurrent received from a laser clock source. Programmable bias levels
provide a wider range of operation.
Ipd
Vin
V0
Vreset
Vauto
Fig. 3. Nominal operation transient with input duty cycle of 50%. Input
Current Ipd creates a rise in the differential voltage between node in and
inb (denoted as V in), triggering reset. V auto is stable when V reset has
duty cycle of 50%.
from node inb to node in, creating a rise in the differential
voltage between node in and inb, denoted as V in. When the
receiver is locked to the input clock frequency, V in crosses
the differential zero point V o, and causes the node reset to
switch from low to high after a certain delay. This closes the
switches connecting the last differential pair DIF F 3 to the
input nodes, applying a downwards slope to V in, actively
resetting the input. When V in falls below V o, node reset
goes low, opening the switches, leaving the input impeadance
high, ready to receive the next current pulse. Since the PD is
integrated, the dimensions can be small enough such that no
reverse bias is required to increase the bandwidth. This allows
us to operate the PD differentially at zero bias.
We see that in steady-state, for each cycle, the total
amount of charge reset by the last differential pair must
equal the charge injected by the PD/PD-emulator. Essentially
the input signal provides just enough of an additional phase
shift to maintain lock at the desired frequency, hence the
name injection-locked. To maintain this balance, ”auto-tuning”
mechanism is designed as shown in Figure 2d. Two matched
current sources gated by the inverted reset charge/discharge a
capacitor, creating a duty cycle detector. The capacitor voltage
V auto is then fed back to control the strength of DIF F 3 as
shown in Figure 2. During steady state (Figure 3), V auto is
at a value such that the strength of DIF F 3 returns V in to
the same voltage each cycle.
If DIF F 3 is initially too strong, Ipd cannot provide enough
charge for V in to cross V o every cycle, causing the duty
cycle of reset to be less than 50%. This causes V auto to drop,
weakening DIF F 3, until lock is achieved. Vice-versa, a weak
DIF F 3 will not be able to pull V in below V o every cycle,
causing V auto to rise until locked. This auto-tune feature
allows the receiver to achieve lock over a larger range of
frequencies and supply voltages (Figure 7), making it more
robust to random mismatch, process corners, and temperature
drifts. The bandwidth of V auto is set an order of magnitude
below the operation frequency, so that there are no stability
issues yet the lock can be achieved relatively fast. Simulations
show that locking is achieved after 10ns.
When using electro-optical modulators as a clock transmit
source, a 50% duty cycle signal is not always guaranteed. With
this integrating design, the receiver will still be able to operate,
while correcting the output duty cycle to be closer to 50%, as
shown in Figure 4. Of course, the output duty cycle accuracy
will be limited by the matching of the two current sources.
Ipd
V0
Vin
Vreset
Vauto
Fig. 4. Nominal operation transient with input duty cycle less than 50%.
Reset stage strength is self-adjusted with auto-tune feature, enforcing the
output duty cycle to be at 50%.
III. M EASUREMENT R ESULTS
The monolithically-integrated clock receiver, along with
other electronic and photonic components, was fabricated in
a standard 45 nm SOI process, the die and circuit layout
are shown in Figure 13. Due to a foundry error in the SiGe
mask definition, the bandwidth and responsitivity of the PD
Reference clock
RxClk clock
mux
Block diagram of the digital backend. Can be configured to measure the ability to lock and jitter performance.
Source or
Received clock
no auto
with auto
Reference clock
Fig. 8. By sampling the source or received clock with an independent clock,
we can plot a histogram of counts versus relative phase.
1.5
8
x 10
4
0.9
1
1.1
Voltage [V]
1.2
1.3
Fig. 6. Measured lock ranges at nominal speed setting. Auto-tune provides
a larger operation range across input clock frequency and supply voltage.
4
3.5
2.5
2
1.5
1
0.8
0.9
Reference
2
0
0
100
200
Phase [ps]
300
400
Fig. 9. Raw count data at 2.5GHz operation. The two peaks represent the
rising edge and falling edge, so the spacing between the peaks is the duty
cycle. The maximum count for the rising edge differs from the falling edge
due to jitter components with bandwidth less than 2.5GHz.
fast config
mid config
slow config
3
0.5
0.7
Count
Clock Frequency [GHz]
Experiment select
2
1
0.8
30
Sense
Amp
Reference clock
Clock Frequency [GHz]
Digital
Backend
asynchronous test counter
Source clock
2.5
30
Sense
Amp
Reference clock
Fig. 5.
asynchronous reference counter
1
1.1
Voltage [V]
1.2
1.3
Fig. 7. Measured lock ranges at different speed settings. Programmable bias
levels can set receiver at different operation regions.
connected to the circuit were too low for the receiver to
function properly. The backup electrical PD-emulator circuit
shown in Figure 2a1 was used for testing.
As shown in Figure 5, tests are performed with two coupled
30-bit asynchronous counters: the test counter counts the
selected output; a reference counter counts the reference clock,
and shuts off both counters when the most-significant bit goes
to 1. To detect whether or not the clock receiver circuit is
locked onto the source clock, we can reset both counters
to zero, and then start both counters simultaneously. After
both counters are automatically shut off, we compare the
counter values; equal numbers represent matching frequencies.
Figure 6 shows the frequency and supply voltage ranges
within which the clock receiver is able to lock, with the
auto-tune enabled or disabled. The auto-tune is disabled by
biasing DIF F 3 with biasp and biasn instead of autop and
auton. Figure 7 shows that the bias levels can be configured
to accommodate a wide range of operation frequencies and
voltages.
By adjusting the relative phase between two clocks of the
same frequency, and using one clock as reference to sample the
other after it has been received by the circuit, we can measure
the jitter performance of the circuit. As shown in Figure 8 and
Figure 9, if the phases are aligned such that the reference clock
edge is at the peak of the measured clock edge’s probability
distribution, then there will be a maximal number of counts
in the test counter. The probability of a count happening is
the probability of a low sample followed by a high sample.
If the jitter is modeled with a random Guassian distribution,
by sweeping the relative phase, the variance of the counts is
approximately the σ of the jitter. A logrithmic plot shown in
Figure 10 allows us to clearly identify the jitter distributions.
Note that the input source clock duty cycle is 46%, and the
output duty cycle is at 45%. This error is due to the previously
mentioned current source mismatch, which can be fixed using
programmable current sources.
By taking the variance of each peak, the output jitter closely
follows a simple Gaussian model, as seen here in Figure 11.
Using this method, we can measure the jitter performance of
the circuit at each point of operation. Shown in Figure 12 are
the circuit performance at 1GHz to 3GHz. While operating the
circuit within 1%UI output jitter by adjusting the configuration
Probability [log10]
ACKNOWLEDGEMENTS
Reference
sigma rise:2.26ps sigma fall:2.68ps
0
−5
−10
0
100
200
Phase [ps]
300
400
The authors would like to thank the Integrated Photonics
teams at both University of Colorado, Boulder and MIT. This
work was supported in part by DARPA, NSF, FCRP, IFC,
Trusted Foundry, Intel, MIT CICS, and NSERC.
Probability [log10]
Probability [log10]
R EFERENCES
No Auto
sigma rise:3.98ps
sigma fall:6.71ps
0
−5
−10
0
100
Auto
200
Phase [ps]
sigma rise:2.75ps
300
400
sigma fall:6.14ps
0
−5
−10
0
100
200
Phase [ps]
300
400
Fig. 10. Transision probability distributions of reference clock, and received
clock with auto tune disabled/enabled at 2.5GHz.
Auto
Rise fit
Fall fit
−2
−4
100
−6
Input Current [uA]
−8
−10
0
100
200
Phase [ps]
300
400
Fig. 11. Gaussian jitter models overlaying measured data at 2.5GHz with
auto tune enabled. Note that on a log scale, the points near the peak are very
heavily weighted.
IV. C ONCLUSION
This paper presents a compact, low-power injection-locked
clock receiver for large-scale integrated optical interconnects.
In source-synchronous WDM links, it is difficult to achieve
both high sensitivity and low power consumption for a given
bandwidth. The injection-locked design provides high input
sensitivity when compared to the TIA-less integrating clock
receiver design. It also has higher energy-efficiency than a
TIA, and breaks the gain/bandwidth tradeoff of a TIA receiver,
while utilizing a robust self-adjusting mechanism and suppress
the jitter.
500
Receiverless
No Auto
Auto
60
40
20
0
Power [uW]
bits of the circuit, we see that the total power consumption
increases steadily as the differential pairs require more current
to operate faster. Also, the required input current, and therefore
the required laser power, would increase to provide a stronger
signal as the jitter σ required is smaller in time as frequency
increases. We can compare this to the two-PD receiverless
case [2]. Assuming we are using the same 10fF photodiodes
and 5fF wire parasitic capacitance, the input current required
is shown in Figure 12 to be much higher as Cpd is required
to swing rail-to-rail. Note that for similar power consumption
and input currents, the auto-enabled configuration has lower
jitter, this because auto-tuning allows the circuit to find a better
operation point.
80
1
1
Auto
No Auto
400
300
200
100
1
2
3
Frequency [GHz]
Output Jitter [%UI]
Probability [log10]
0
[1] C. Batten, A. Joshi, J. Orcutt, C. Holzwarth, M. Popovic, J. Hoyt,
F. Kartner, R. Ram, V. Stojanovic, and K. Asanovic, “Building manycore
processor-to-dram networks with monolithic cmos silicon photonics,”
Micro, IEEE, vol. PP, no. 99, p. 1, 2009.
[2] A. Bhatnagar, C. Debaes, R. Chen, N. Hellman, G. Keeler, D. Agarwal,
H. Thienpont, and D. Miller, “Receiverless clocking of a cmos digital
circuit using short optical pulses,” in Lasers and Electro-Optics Society,
2002. LEOS 2002. The 15th Annual Meeting of the IEEE, vol. 1, pp. 127
– 128 vol.1, 2002.
[3] S. Goswami, J. Silver, T. Copani, W. Chen, H. Barnaby, B. Vermeire,
and S. Kiaei, “A 14mw 5gb/s cmos tia with gain-reuse regulated cascode
compensation for parallel optical interconnects,” in Solid-State Circuits
Conference - Digest of Technical Papers, 2009. ISSCC 2009. IEEE
International, pp. 100 –101,101a, feb. 2009.
[4] A. Kern, A. Chandrakasan, and I. Young, “18gb/s optical io: Vcsel driver
and tia in 90nm cmos,” in VLSI Circuits, 2007 IEEE Symposium on,
pp. 276 –277, june 2007.
2
3
Frequency [GHz]
Auto
No Auto
0.8
0.6
0.4
0.2
1
2
Frequency [GHz]
3
Fig. 12. Measured input current, total power consumption (including input
currents and biasing circuits), and output jitter vs. clock frequency. Input jitter
from clock source is within 0.3%UI.
2.9mm x 2.9mm
Clock receiver area
40um x 25um
Digital
backend
Fig. 13. Die photo with circuit layout enlarged. Note vertical coupler and
photodiode below receiver circuit.