Download paper

IEEE Asian Solid-State Circuits Conference November 14-16, 2011 / Jeju, Korea Injection-Locked Clock Receiver for Monolithic 978-1-4577-1785-7/11/$26.00 ©2011 IEEE Optical Link in 45nm SOI Jonathan Leu and Vladimir Stojanović Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Cambridge, MA 02139 Email: cyleu at mit.edu Abstract—This paper presents a compact, low-power injectionlocked clock receiver for large-scale integrated optical interconnects. The injection-locked design provides high input sensitivity when compared to the TIA-less integrating clock receiver design. It also breaks the gain/bandwidth tradeoff of a traditional transimpedance amplifier (TIA) receiver, with a self-adjusting mechanism to increase robustness. The receiver is designed in 45nm SOI and shown to operate from 1.0 GHz to 3.0 GHz while consuming from 186 μW to 444 μW, with an input sensitivity from 8.6 μA to 33.2 μA and output jitter σ within 1 %UI. I. I NTRODUCTION IEEE The emergence of multi-core processors creates increasing demand for on-chip and off-chip data bandwidth, which copper interconnects cannot meet due to chip power delivery and packaging (cooling and pin-density) constraints. To address this problem, employing wavelength-division multiplexing (WDM) on monolithically integrated silicon photonic waveguides can greatly improve energy-efficiency and bandwidth density [1]. Dense WDM leverages the link parallelism to improve energy-efficiency, keeping the per wavelength data-rate in the 1-5Gbps regime, while still enabling high throughput per waveguide(100-500Gbps). By using one wavelength in an on-chip or chip-to-chip WDM link, the source clock can be transmitted along with the data, as shown in Figure 1. Since clock-transmitter and data-transmitter share the same clock grid, relative jitter between the sent clock and data is minimal. Also, optical clock signaling does not suffer from power rail injected noise, electromagnetic interference or crosstalk, so no jitter is added in the channel. At the clock receiver, the signal is regenerated, buffered, and used to clock the other data receivers. The low latency across wavelengths of optical waveguides means that all data receivers can operate on the same clock phase, removing the need for for phase-locked loops (PLLs) or delaylocked loops (DLLs). High sensitivity and low power sense-amps can be used for receiving data, so laser power and circuit power can be saved. However for clock receivers, previous options are either lowsensitivity integrating designs [2], which require high laser power, or using a traditional TIA and spending circuit power to gain sensitivity. This paper breaks the constraints of previous approaches, by utilizing a combination of injection-locking and integrate-reset to realize high-sensitivity and Asian Solid-Statetopologies Circuits Conference low-jitter clock receiver. November 14-16, 2011 / Jeju, Korea 978-1-4577-1785-7/11/$26.00 ©2011 IEEE Fig. 1. A point-to-point optical WDM source-forwarded link. Transmit row is on the left. The clock uses one channel λ1 , and is received in sync with all other channels on the left. II. C LOCK R ECEIVER C IRCUIT The proposed circuit, Figure 2, overcomes the transimpedance amplifier TIA gain/bandwidth tradeoff [3][4] with an integrate and reset topology. The input current is integrated onto the input parasitic capacitance, equivalent to a high-gain TIA; yet, it can also be strongly reset, equivalent to a highbandwidth TIA. This is highly beneficial as the relatively large photodiode (estimated to be 10fF) and parasitic interconnect (estimated to be 5fF) capacitance Cpd causes the input node to be the dominant pole for an optical receiver, and any improvement here directly impacts the overall performance. Furthermore, the magnitude of the input current Ipd divided by the input capacitance Cpd determines the transient slope of V in. For a given input-referred noise in voltage σV , the corresponding jitter σT is σV divided by the slope of V in. To minimize output jitter, we wish to maximize the slope of V in, so a maximum input impeadance is desired. For optical operation, upon receiving a laser pulse, the photodiode provides an input current pulse. The strength of a backup PD-emulator and the bias voltages biasp and biasn of the differential pair stages shown in Figure 2, are digitally programmable. As shown in Figure 3, the PD or the PD-emulator generates a current pulse Ipd, representative of a clock source generated by a transmitting optical modulator. Ipd is drawn a1) M=1 M=2 M=4 M=1 in Vin PD[0 PD[1] PD[0] PD[1 PD[2] PD[2 inb b biasp autop biasp d) in DIFF1 DIFF2 biasn autop DIFF3 auto inb biasn auton auton Source in inb Source PD[0 PD[1] PD[0] PD[1 PD[2] PD[2 Cauto b) a2) in reset c)) biasp Cpd M=1 M=2 M=4 Differential to single-end RxClk clock biasn inb Fig. 2. Schematic of the clock receiver. Photodiode emulator circuit simulates photocurrent received from a laser clock source. Programmable bias levels provide a wider range of operation. Ipd Vin V0 Vreset Vauto Fig. 3. Nominal operation transient with input duty cycle of 50%. Input Current Ipd creates a rise in the differential voltage between node in and inb (denoted as V in), triggering reset. V auto is stable when V reset has duty cycle of 50%. from node inb to node in, creating a rise in the differential voltage between node in and inb, denoted as V in. When the receiver is locked to the input clock frequency, V in crosses the differential zero point V o, and causes the node reset to switch from low to high after a certain delay. This closes the switches connecting the last differential pair DIF F 3 to the input nodes, applying a downwards slope to V in, actively resetting the input. When V in falls below V o, node reset goes low, opening the switches, leaving the input impeadance high, ready to receive the next current pulse. Since the PD is integrated, the dimensions can be small enough such that no reverse bias is required to increase the bandwidth. This allows us to operate the PD differentially at zero bias. We see that in steady-state, for each cycle, the total amount of charge reset by the last differential pair must equal the charge injected by the PD/PD-emulator. Essentially the input signal provides just enough of an additional phase shift to maintain lock at the desired frequency, hence the name injection-locked. To maintain this balance, ”auto-tuning” mechanism is designed as shown in Figure 2d. Two matched current sources gated by the inverted reset charge/discharge a capacitor, creating a duty cycle detector. The capacitor voltage V auto is then fed back to control the strength of DIF F 3 as shown in Figure 2. During steady state (Figure 3), V auto is at a value such that the strength of DIF F 3 returns V in to the same voltage each cycle. If DIF F 3 is initially too strong, Ipd cannot provide enough charge for V in to cross V o every cycle, causing the duty cycle of reset to be less than 50%. This causes V auto to drop, weakening DIF F 3, until lock is achieved. Vice-versa, a weak DIF F 3 will not be able to pull V in below V o every cycle, causing V auto to rise until locked. This auto-tune feature allows the receiver to achieve lock over a larger range of frequencies and supply voltages (Figure 7), making it more robust to random mismatch, process corners, and temperature drifts. The bandwidth of V auto is set an order of magnitude below the operation frequency, so that there are no stability issues yet the lock can be achieved relatively fast. Simulations show that locking is achieved after 10ns. When using electro-optical modulators as a clock transmit source, a 50% duty cycle signal is not always guaranteed. With this integrating design, the receiver will still be able to operate, while correcting the output duty cycle to be closer to 50%, as shown in Figure 4. Of course, the output duty cycle accuracy will be limited by the matching of the two current sources. Ipd V0 Vin Vreset Vauto Fig. 4. Nominal operation transient with input duty cycle less than 50%. Reset stage strength is self-adjusted with auto-tune feature, enforcing the output duty cycle to be at 50%. III. M EASUREMENT R ESULTS The monolithically-integrated clock receiver, along with other electronic and photonic components, was fabricated in a standard 45 nm SOI process, the die and circuit layout are shown in Figure 13. Due to a foundry error in the SiGe mask definition, the bandwidth and responsitivity of the PD Reference clock RxClk clock mux Block diagram of the digital backend. Can be configured to measure the ability to lock and jitter performance. Source or Received clock no auto with auto Reference clock Fig. 8. By sampling the source or received clock with an independent clock, we can plot a histogram of counts versus relative phase. 1.5 8 x 10 4 0.9 1 1.1 Voltage [V] 1.2 1.3 Fig. 6. Measured lock ranges at nominal speed setting. Auto-tune provides a larger operation range across input clock frequency and supply voltage. 4 3.5 2.5 2 1.5 1 0.8 0.9 Reference 2 0 0 100 200 Phase [ps] 300 400 Fig. 9. Raw count data at 2.5GHz operation. The two peaks represent the rising edge and falling edge, so the spacing between the peaks is the duty cycle. The maximum count for the rising edge differs from the falling edge due to jitter components with bandwidth less than 2.5GHz. fast config mid config slow config 3 0.5 0.7 Count Clock Frequency [GHz] Experiment select 2 1 0.8 30 Sense Amp Reference clock Clock Frequency [GHz] Digital Backend asynchronous test counter Source clock 2.5 30 Sense Amp Reference clock Fig. 5. asynchronous reference counter 1 1.1 Voltage [V] 1.2 1.3 Fig. 7. Measured lock ranges at different speed settings. Programmable bias levels can set receiver at different operation regions. connected to the circuit were too low for the receiver to function properly. The backup electrical PD-emulator circuit shown in Figure 2a1 was used for testing. As shown in Figure 5, tests are performed with two coupled 30-bit asynchronous counters: the test counter counts the selected output; a reference counter counts the reference clock, and shuts off both counters when the most-significant bit goes to 1. To detect whether or not the clock receiver circuit is locked onto the source clock, we can reset both counters to zero, and then start both counters simultaneously. After both counters are automatically shut off, we compare the counter values; equal numbers represent matching frequencies. Figure 6 shows the frequency and supply voltage ranges within which the clock receiver is able to lock, with the auto-tune enabled or disabled. The auto-tune is disabled by biasing DIF F 3 with biasp and biasn instead of autop and auton. Figure 7 shows that the bias levels can be configured to accommodate a wide range of operation frequencies and voltages. By adjusting the relative phase between two clocks of the same frequency, and using one clock as reference to sample the other after it has been received by the circuit, we can measure the jitter performance of the circuit. As shown in Figure 8 and Figure 9, if the phases are aligned such that the reference clock edge is at the peak of the measured clock edge’s probability distribution, then there will be a maximal number of counts in the test counter. The probability of a count happening is the probability of a low sample followed by a high sample. If the jitter is modeled with a random Guassian distribution, by sweeping the relative phase, the variance of the counts is approximately the σ of the jitter. A logrithmic plot shown in Figure 10 allows us to clearly identify the jitter distributions. Note that the input source clock duty cycle is 46%, and the output duty cycle is at 45%. This error is due to the previously mentioned current source mismatch, which can be fixed using programmable current sources. By taking the variance of each peak, the output jitter closely follows a simple Gaussian model, as seen here in Figure 11. Using this method, we can measure the jitter performance of the circuit at each point of operation. Shown in Figure 12 are the circuit performance at 1GHz to 3GHz. While operating the circuit within 1%UI output jitter by adjusting the configuration Probability [log10] ACKNOWLEDGEMENTS Reference sigma rise:2.26ps sigma fall:2.68ps 0 −5 −10 0 100 200 Phase [ps] 300 400 The authors would like to thank the Integrated Photonics teams at both University of Colorado, Boulder and MIT. This work was supported in part by DARPA, NSF, FCRP, IFC, Trusted Foundry, Intel, MIT CICS, and NSERC. Probability [log10] Probability [log10] R EFERENCES No Auto sigma rise:3.98ps sigma fall:6.71ps 0 −5 −10 0 100 Auto 200 Phase [ps] sigma rise:2.75ps 300 400 sigma fall:6.14ps 0 −5 −10 0 100 200 Phase [ps] 300 400 Fig. 10. Transision probability distributions of reference clock, and received clock with auto tune disabled/enabled at 2.5GHz. Auto Rise fit Fall fit −2 −4 100 −6 Input Current [uA] −8 −10 0 100 200 Phase [ps] 300 400 Fig. 11. Gaussian jitter models overlaying measured data at 2.5GHz with auto tune enabled. Note that on a log scale, the points near the peak are very heavily weighted. IV. C ONCLUSION This paper presents a compact, low-power injection-locked clock receiver for large-scale integrated optical interconnects. In source-synchronous WDM links, it is difficult to achieve both high sensitivity and low power consumption for a given bandwidth. The injection-locked design provides high input sensitivity when compared to the TIA-less integrating clock receiver design. It also has higher energy-efficiency than a TIA, and breaks the gain/bandwidth tradeoff of a TIA receiver, while utilizing a robust self-adjusting mechanism and suppress the jitter. 500 Receiverless No Auto Auto 60 40 20 0 Power [uW] bits of the circuit, we see that the total power consumption increases steadily as the differential pairs require more current to operate faster. Also, the required input current, and therefore the required laser power, would increase to provide a stronger signal as the jitter σ required is smaller in time as frequency increases. We can compare this to the two-PD receiverless case [2]. Assuming we are using the same 10fF photodiodes and 5fF wire parasitic capacitance, the input current required is shown in Figure 12 to be much higher as Cpd is required to swing rail-to-rail. Note that for similar power consumption and input currents, the auto-enabled configuration has lower jitter, this because auto-tuning allows the circuit to find a better operation point. 80 1 1 Auto No Auto 400 300 200 100 1 2 3 Frequency [GHz] Output Jitter [%UI] Probability [log10] 0 [1] C. Batten, A. Joshi, J. Orcutt, C. Holzwarth, M. Popovic, J. Hoyt, F. Kartner, R. Ram, V. Stojanovic, and K. Asanovic, “Building manycore processor-to-dram networks with monolithic cmos silicon photonics,” Micro, IEEE, vol. PP, no. 99, p. 1, 2009. [2] A. Bhatnagar, C. Debaes, R. Chen, N. Hellman, G. Keeler, D. Agarwal, H. Thienpont, and D. Miller, “Receiverless clocking of a cmos digital circuit using short optical pulses,” in Lasers and Electro-Optics Society, 2002. LEOS 2002. The 15th Annual Meeting of the IEEE, vol. 1, pp. 127 – 128 vol.1, 2002. [3] S. Goswami, J. Silver, T. Copani, W. Chen, H. Barnaby, B. Vermeire, and S. Kiaei, “A 14mw 5gb/s cmos tia with gain-reuse regulated cascode compensation for parallel optical interconnects,” in Solid-State Circuits Conference - Digest of Technical Papers, 2009. ISSCC 2009. IEEE International, pp. 100 –101,101a, feb. 2009. [4] A. Kern, A. Chandrakasan, and I. Young, “18gb/s optical io: Vcsel driver and tia in 90nm cmos,” in VLSI Circuits, 2007 IEEE Symposium on, pp. 276 –277, june 2007. 2 3 Frequency [GHz] Auto No Auto 0.8 0.6 0.4 0.2 1 2 Frequency [GHz] 3 Fig. 12. Measured input current, total power consumption (including input currents and biasing circuits), and output jitter vs. clock frequency. Input jitter from clock source is within 0.3%UI. 2.9mm x 2.9mm Clock receiver area 40um x 25um Digital backend Fig. 13. Die photo with circuit layout enlarged. Note vertical coupler and photodiode below receiver circuit.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download paper