* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Speed and energy analysis of digital interconnections: comparison
Survey
Document related concepts
Valve RF amplifier wikipedia , lookup
Spark-gap transmitter wikipedia , lookup
Transistor–transistor logic wikipedia , lookup
Telecommunications engineering wikipedia , lookup
Index of electronics articles wikipedia , lookup
Surge protector wikipedia , lookup
Resistive opto-isolator wikipedia , lookup
Radio transmitter design wikipedia , lookup
Current mirror wikipedia , lookup
Power electronics wikipedia , lookup
Power MOSFET wikipedia , lookup
Switched-mode power supply wikipedia , lookup
Superluminescent diode wikipedia , lookup
Transcript
Speed and energy analysis of digital interconnections: comparison of on-chip, off-chip, and free-space technologies Gökçe I. Yayla, Philippe J. Marchand, and Sadik C. Esener We model and compare on-chip ~up to wafer scale! and off-chip ~multichip module! high-speed electrical interconnections with free-space optical interconnections in terms of speed performance and energy requirements for digital transmission in large-scale systems. For all technologies the interconnections are first modeled and optimized for minimum delay as functions of the interconnection length for both one-to-one and fan-out connections. Then energy requirements are derived as functions of the interconnection length. Free-space optical interconnections that use multiple-quantum-well modulators or vertical-cavity surface-emitting lasers as transmitters are shown to offer a speed– energy product advantage as high as 30 over that of the electrical interconnection technologies. © 1998 Optical Society of America OCIS codes: 200.4650, 250.0250. 1. Introduction VLSI technology improvements have dramatically increased microelectronic device densities and speeds. However, interconnection technology has not advanced proportionally. One of the main reasons is the limited availability of interconnection materials that are compatible with VLSI and electronic packaging technologies. The increased wire resistance as a result of smaller feature size, the residual wire capacitance resulting from fringing fields, the aspect ratio of the interconnection wires, and the interwire cross talk are among the main factors that prohibit more significant improvements in electrical interconnect performance. Consequently the overall performance of VLSI systems becomes increasingly dominated by the performance of long interconnects. For overcoming this limitation free-space optical interconnections have been suggested, in which long electrical interconnects are replaced by an optical link consisting of a light transmitter, interconnection optics, and a photodetector.1– 8 This scheme, al- The authors are with the Department of Electrical and Computer Engineering, University of California San Diego, 9500 Gilman Drive, La Jolla, California 92093-0407. Received 14 April 1997; revised manuscript received 29 August 1997. 0003-6935y98y020205-23$10.00y0 © 1998 Optical Society of America though devoid of electrical interconnection parasitics, has its own difficulties. The unavailability of monolithically integrated optical transmitters on silicon imposes hybrid integration schemes with increased cost. In addition, the transformation of information from the electrical to the optical domain and vice versa introduces inefficiencies to the energy budget. Therefore it is essential to identify the regions of superiority between electrical and optical interconnects and to determine some of the important tradeoffs. This would help system designers choose the proper interconnect technology for a given system application. In this paper we compare the speed performance and the energy cost of a class of electrical and optical interconnections for use in large-scale digital computing systems. A similar analysis comparing free-space optics and electrical interconnects was presented by Feldman et al.9 approximately 10 years ago. The study presented here expands that analysis in many ways. It includes more comprehensive interconnection models, and, in addition to evaluating on-chip interconnections, it evaluates offchip electrical interconnects @multichip modules ~MCM’s!#. In this paper on-chip, off-chip, and free-space digital interconnections based on multiple-quantum well ~MQW! or vertical-cavity surface-emitting lasers ~VCSEL’s! are evaluated. The different interconnect technologies are compared after they are analyzed in detail and optimized for minimal delay. In 10 January 1998 y Vol. 37, No. 2 y APPLIED OPTICS 205 the free-space case the following considerations are included: the MQW modulator-saturation phenomena, the dependence of the VCSEL output power on speed, the dependence of the VCSEL threshold on output power, and the effects of parasitics resulting from the hybrid integration of silicon devices with optical transmitters. In the case of off-chip electrical interconnects the effects of both series- and parallel-termination schemes are discussed. In all cases superbuffers are designed to minimize the propagation delay between minimum logic and offchip line drivers. An optimum repeater design is adopted to maximize the speed of distributed on-chip interconnections. Finally, both return-to-zero ~RZ! and nonreturn-to-zero ~NRZ! transmission schemes are used, where appropriate, to minimize energy dissipation. Note that we do not consider any cross talk ~optical or electrical! issues in this paper, which in the case of electrical cross talk becomes more and more critical as complementary metal-oxide semiconductor ~CMOS! technology is scaled down. Effects of scaling are considered only qualitatively in Section 7. In Section 2 we present the basic assumptions and limitations of the study. In Section 3 we define the interconnection in the scope of this analysis and introduce fundamental equations of energy calculation along with the five critical parameters required for calculating the energy of an interconnection. The design and the analysis of off-chip and on-chip electrical interconnections for data transmissions are presented in Sections 4 and 5, respectively. Section 6 is devoted to the analysis of the optical interconnects, separately for the cases of MQW modulators and VCSEL’s as optical transmitters. In each case the performance of the optical interconnect is compared with on-chip and off-chip electrical interconnects. In Section 7 we discuss qualitatively some of the effects of VLSI technology scaling on the performance of both electrical and free-space optical interconnects. Finally, conclusions are presented in Section 8. 2. Assumptions Throughout this paper we use the term interconnection to refer to the physical medium used for digital communications between electronic subsystems. Below are the assumptions under which we analyze such interconnections in this paper ~which are labeled A1 through A12!: A1. The application range that we are considering can be defined as large-scale digital computing systems that use dense interconnections. This leads to the assumption that, in the electrical domain, only silicon CMOS VLSI technology ~chip or wafer scale! and MCM technologies are considered. This restricts our analysis to a certain class of on-chip and off-chip digital interconnections. Off-chip interconnections are those that interconnect chips on a MCM substrate, whereas on-chip connections start and end on silicon within a chip or a wafer. Because the interconnection line length is an independent param206 APPLIED OPTICS y Vol. 37, No. 2 y 10 January 1998 eter in our analysis, increasing the on-chip line length automatically extends the analysis to the wafer-scale integration domain. A2. We do not consider a certain type of computing algorithm or architecture to calculate the required interconnection line lengths or fan-out in a particular system. We consider the interconnection length ~Lint! as an independent variable. Although the derivations are carried out for both one-to-one and fan-out cases, we consider only one-to-one connections in the comparative results, as they are representative of the respective merits of the technologies. More comprehensive results that include fan-out considerations, clock distribution networks, and lead lanthanum zirconium titanate modulators as an alternative free-space transmitter technology can be found in Ref. 10. A3. Analog fan-in is not considered because of the assumption of digital communication. A4. We analyze on-chip and off-chip electrical interconnections separately. The performance of systems that use both types of interconnection in the same channel can be estimated by a combination of the results of the independent studies. A5. To ease the communication protocol, we assume synchronous communication in which data are forced to the transmitter end of the interconnection by the rising edge of a global clock signal. The data are sensed at the receiver end of the interconnection by the falling edge of the clock. Thus the sum of the delays of the various channel components as well as the maximum clock skew determines the minimum clock period ~maximum frequency of synchronous operation!. A6. We assume static CMOS logic design with rail-to-rail voltage swings. However, the same analysis methodology could be applied to dynamic or reduced-voltage-swing logic designs by proper adjustment of voltage swings and currents in the analysis. Indeed, to calculate the energy of an interconnection ~regardless of the signaling scheme! it is sufficient to determine the capacitive, short-circuit, and steadystate components of the energy. A7. We do not include the scaling analysis of VLSI technology. We use 0.5-mm CMOS technology parameters for numerical illustrations. However, in Section 7 we discuss the first-order effects of technology scaling on both electrical and optical interconnect performances. A8. We consider only free-space optics in our optical interconnection analysis. However, we do not carry out any in-depth design of the optical routing subsystem; rather, we model it with an optical timeof-flight delay and an optical power transfer efficiency. The time-of-flight delay varies as a function of the interconnection length whereas the power efficiency is assumed to be independent of the interconnection length. A9. For the optical interconnections we assume that the light transmitters ~MQW modulators or VCSEL’s! are flip-chip bonded to silicon and that integrated reverse-biased silicon p–n junctions are used as photodiodes. Note that standard silicon p–n diodes are speed limited11 well below 1 GHz but that other types of faster detectors ~e.g., hybridized MQW p-i-n diodes or GaAs metal–semiconductor–metal detectors! could be used for the high-speed rates. ~Interestingly enough, MQW p-i-n diodes have similar parasitics, although they do exhibit better responsivity than silicon p–n diodes of similar dimensions even when accounting for the hybrid integration.! A10. For the series-terminated off-chip electrical interconnection as well as for the modulator-based optical interconnections, we assume NRZ communication. In this case the channel logic level is not altered unless a new data bit to be transmitted differs from the previously transmitted data bit. For the parallel-terminated electrical interconnection and the VCSEL-based optical interconnection cases we assume a RZ transmission scheme. This is because the dc power consumption that is due to the paralleltermination resistor or the laser current is much higher than the power consumption of the interconnection that is due to switching. A11. In the optical interconnection channel we assume that a required bit-error rate can be achieved by requiring a certain voltage swing at the photodiode output, which, in turn, requires a certain input optical power. In our calculations we assume a photodiode output voltage swing of 330 mV, which is approximately equal to the transition width of a CMOS inverter transfer characteristic for 3.3-V power supply voltage in a 0.5-mm CMOS. We also assume an amplifier with a gain of only 10, which is quite conservative. Note that this voltage swing is due to the charging of the diode and amplifier input capacitance by the photocurrent. Depending on the type of detector and the type of amplifier, this required voltage swing ~and associated photocurrent and input optical power! would change. A12. In the off-chip interconnection case we assume that the interconnection conductor is lossless: this approximation holds within the limits of the independent parameters used. 3. Definition of Interconnection and Estimation of Energy From assumptions A1 and A2, we define an interconnection in our scope as follows: An interconnection is the physical implementation of a 1-bit-wide digital communication channel within or between digital VLSI subsystems ~chips, wafers! involving parasitics of the medium as well as active and passive design components used to force, restore, enhance, route ~optically!, and sense the data in the channel. This definition is illustrated in Fig. 1. Note that we require the interconnection to connect minimum geometry gates because of the large-scale integration requirement. Let us now consider the average energy requirement of a 1-bit data transmission through the interconnection. Because in CMOS design any logic gate ~or a combination of gates! can be represented electrically with an equivalent inverter, we consider the Fig. 1. Definition of interconnection in the context of this paper. simple inverter circuit shown in Fig. 2. This inverter represents all the logic devices in the interconnection, whereas the capacitance Ctot represents the total capacitance switched during data transmission. As the input to the inverter switches from Vsup to ground, a current flows from the power supply through the positive metal-oxide semiconductor ~PMOS! transistor. Part of the average value of this current is used to charge Ctot ~capacitive component!, while the remaining part flows to ground through the negative metal-oxide semiconductor ~NMOS! transistor, which is partly ON during switching ~short-circuit component!. The instantaneous capacitive current is given by Ic 5 Ctot dv . dt (1) The supply power associated with this current is pc 5 IcVsup 5 Ctot dv Vsup. dt (2) The energy is the integral of the power over the period of time that the power is dissipated: E5 * pdt. (3) t When Eq. ~2! is used in Eq. ~3! with the integration range over the full voltage swing ~based on assumption A6!, the capacitive component of the energy drawn from the power supply is given by EC 5 CtotVsup2. (4) Fig. 2. Electrical model of an interconnection for the calculation of the energy requirement. An inverter models all the logic devices in the interconnection, and a resistor models all the devices that require steady-state power. Ctot is the total interconnection capacitance that is switched during the transmission of a digital bit. 10 January 1998 y Vol. 37, No. 2 y APPLIED OPTICS 207 Half of this energy is stored on Ctot, while the other half is dissipated as heat over the PMOS transistor’s resistance during the charging of Ctot. During the switching of the input from zero to Vsup, the inverter does not require any capacitive energy from the power supply. This is because the capacitive discharge currents originate from the energy stored on Ctot and not from the power supply. Therefore Eq. ~4! represents the total capacitive energy requirement from the power supply during a period of the input signal that involves two opposite transitions. On the other hand, the average short-circuit current of a CMOS inverter ~caused by the two switching transitions in one period, assuming equal rise and fall times! is given as12 Isc 5 1 ~Vsup 2 2VT!3 tr keff , 12 Vsup t (5) where keff is the effective transconductance parameter ~modeling the equivalent transconductance of all the gates in the interconnection!, VT is the transistor threshold voltage, and tr and T are the rise time and the period of the input signal, respectively. Multiplying Eq. ~5! by Vsup to calculate the power and applying Eq. ~3! over the period yields the shortcircuit component of the energy drawn from the power supply as a result of two opposite switching transitions of the input signal: 1 Esc 5 keff tr~Vsup 2 2VT!3. 12 (6) So far we have considered the energy that is due to capacitive and short-circuit currents. Some circuitry in the interconnection may also consume considerable steady-state current because of termination, biasing, or high leakage. Covering such cases, we can express the total energy ~per period of the input signal! as ET 5 EC 1 ESC 1 ESS, (7) where ESS represents the energy consumed as a result of the steady-state currents: ESS 5 Vsup~IHTH 1 ILTL!, (8) where IH and IL are the average steady-state currents from power supply to ground during the steady-state high and low levels of the input signal, respectively, and TH and TL are the durations of the high and the low logic levels, respectively. Let us now consider Fig. 3, which shows the average case of a 4-bit serial data transmission, on the basis of assumptions A4 and A10. From Fig. 3~a!, we observe that, in the NRZ case, the channel experiences only one pair of opposite switching ~low to high and high to low! per 4 bits of data transmission. 208 APPLIED OPTICS y Vol. 37, No. 2 y 10 January 1998 Fig. 3. Average 4-bit transmission through the channel: ~a! NRZ. ~b! RZ. Thus TH and TL are each equal to two clock periods. The average energy per bit can then be expressed as Enrzybit 5 EC ESC ESSnrz 1 1 , 4 4 4 (9) where ESSnrz 5 2VsupT~IH 1 IL!. (10) In the RZ case @Fig. 3~b!# the channel performs two pairs of switching per 4 bits of data transmission, and TH and TL are equal to 1 and 3 clock periods, respectively. In this case the average energy per bit is Erzybit 5 EC ESC Essrz 1 1 , 2 2 4 (11) where Essrz 5 VsupT~IH 1 3IL!. (12) For a parallel-terminated electrical interconnection, in which the termination impedance is located at the end of the line, one must take into account the finite time-of-flight delay ~tf ! of the electrical waveform in the energy calculation. Specifically, tf should be subtracted from the high-level duration of the data TH at the driver-chip output. This is because it is only after a tf delay of the waveform that the signal reaches the termination and creates a current through it. When the channel switches back to zero at the transmitter output, the paralleltermination resistor continues to conduct current. However, the source of this energy does not come from the supply ~because the transmitter driver output is disconnected from the power supply!, but it comes from the energy stored in the transmission line. Equations ~4!, ~6!, ~9!, and ~11! are necessary and sufficient to calculate the energy requirement of the interconnection. The evaluation of these equations Fig. 4. Model of an off-chip electrical interconnection. require that, besides the technology parameters Vsup and VT, the following five parameters be determined: ~1! Ctot, total capacitance switched during transmission, ~2! keff, effective transconductance of all active devices in the interconnection, ~3! IH and IL, high- and low-level steady-state currents during data transmission, respectively, ~4! tr, rise time of the signal in the interconnection, ~5! T, period of the synchronous system clock. To calculate the above five parameters, we apply well-established circuit techniques used to minimize the propagation delay through interconnections, such as the use of superbuffers, optimum repeaters, or transmission-line terminations. After the interconnection is designed with these techniques to operate at the highest possible speed, we estimate the energy requirement of a 1-bit data transmission through the channel. 4. Speed and Energy of Off-Chip Electrical Interconnections Figure 4 illustrates the off-chip interconnection scheme. The transmitter chip involves an electronic subblock composed of dense minimum-size devices necessary for large-scale computation or storage. A superbuffer to drive a large-size line driver for which a global off-chip interconnection is needed follows this block. The line driver forces the data into the off-chip conductor by means of an output pin. The data propagate along the conductor and are received by the various receiver chips. In the one-to-one connection case, there is only one chip at the end of the conductor. Each receiver chip is connected to the off-chip conductor by means of an input pin, which is then connected to a minimum-size inverter to receive and restore the data for use in the following electronic block. If a parallel-termination scheme is used, then the off-chip conductor is terminated with a paralleltermination resistor RT whose value matches the conductor impedance. If a series termination is used then the line-driver output resistance is matched to the off-chip conductor’s impedance. Because of the large width and thickness of off-chip conductors and the low resistivity of the materials used, such off-chip connections offer very low unitlength resistance ~,1 Vycm!. For practical purposes they can be considered lossless; for short interconnection lengths the transmission-line behavior can be neglected, and the line acts as a lumped capacitor. In this regime the optimum superbuffer ~to minimize the propagation delay! can be designed on the basis of the total load capacitance of the driver chip.13 The interconnection can be treated as a lumped capacitor when14 –21 tr . 5tf, (13) where tr is the rise time of the data signal in the interconnection and tf is the time-of-flight delay of the signal propagation through the interconnection conductor. For long interconnection lengths, such that tr , 2.5tf, the transmission-line phenomena become dominant.14 To enable the continuity of the analysis we use stricter criteria in this study and assume that the transmission-line phenomena need to be considered when tr , 5tf. (14) A. Interconnection in the Lumped-Capacitor Regime: tr . 5tf In this regime a superbuffer is both necessary and sufficient to drive the total load capacitance of the driver chip with minimal delay.13,14 The load capacitance of the driver chip can be estimated ~Fig. 4! as CL 5 Cdr 1 LintCint off 1 NCrc, (15) where Lint and Cint off are the off-chip interconnection length and interconnection capacitance per unit length, respectively, and Cdr and Crc are the driverchip output and the receiver-chip input capacitances, respectively: Cdr 5 Cpin 1 Csb,o, (16) Crc 5 Cpin 1 Cmin,i, (17) where Cpin is the chip package pin capacitance given by the packaging technology, Cmin,i is the minimum inverter input capacitance given by the on-chip inte10 January 1998 y Vol. 37, No. 2 y APPLIED OPTICS 209 gration technology, and Csb,o is the last superbuffer stage output capacitance given by Eq. ~A5! in Appendix A. The process of using Eq. ~15! in Eq. ~A3!, solving n, the number of superbuffer stages from Eq. ~A5!, and equating to Eq. ~A3!, provides CL as a function of the technology parameters and independent variables: CL 5 Cpin 1 LintCint off 1 NCrc , Cmin,o 12 bCmin,i (18) where Cmin,o is the minimum inverter output capacitance and b is the superbuffer tapering factor ~see Appendix A for details!. From Fig. 4 we see that the interconnection length as a function of the number of chips ~N! is Lint 5 Leff~N 1 1!, (19) Leff 5 Lc 1 Lsp, (20) where where Lc is the side length of a chip and Lsp is the spacing between subsequent chips. Using Eq. ~19! in Eq. ~18! gives the total load capacitance of the driver chip: CL N 5 Cpin 1 Leff Cint off 1 N~Leff Cint off 1 Crc! . (21) Cmin,o 12 bCmin,i For a one-to-one connection, i.e., N 5 1, Eq. ~18! reduces to CL1 5 Cpin 1 LintCint off 1 Crc . Cmin,o 12 bCmin,i (22) Substituting Eq. ~21! or Eq. ~22! for load capacitance into Eq. ~A3! gives the required number n1,N of stages in the superbuffer. Using this result in Eq. ~A6! provides the total parasitic superbuffer capacitance Csb. The total capacitance of the interconnection is then Ctot1,N 5 Csb1,N 1 Cpin LintCint off 1 NCrc. (24) The rise time of the signals in the superbuffer is estimated by Eq. ~A4!. The superbuffer propagation delay is found by substitution of n1,N into Eq. ~A1!. Finally, on the basis of the synchronous operation assumption, the minimum clock period TCLK is re210 TCLK 5 tsb,p. (25) This concludes the calculation of the five parameters of the interconnection ~listed at the end of Section 3! necessary to estimate the speed performance and the energy requirement of the interconnection in the lumped-capacitor regime. In Subsection 4.B we extend the analysis to the case of long off-chip interconnections, which behave as transmission lines. B. Interconnection in the Transmission-Line Regime: tr , 5tf In this regime a series- or parallel-termination scheme is used to minimize reflections and spurious transitions. Figure 4 illustrates these termination schemes. In the case of series termination the driver output resistance is matched to the impedance of the interconnection conductor. Because of the equal impedance of the conductor and the driver, the initial voltage transfer to the line is only half the supply level. For achieving a full supply level across the conductor, the signal has to bounce from the receiver end and propagate back to the driver site, thus requiring a round-trip propagation of the signal on the conductor. In the parallel-termination scheme the receiver end is terminated with a resistor equal to the line impedance. This way, no reflections occur from the receiver end, but the driver initially has to provide a high ~or low! enough logic voltage to the line. Because only a one-way trip of the signal is needed, a parallel-termination scheme provides faster data transmission than the series termination, but it also requires more energy because of the steady-state current consumption through the parallel-termination resistor. Because both termination schemes require larger-than-minimum geometry drivers, a superbuffer is needed to connect the minimum-size logic to the line drivers and minimize the propagation delay. In the case of a one-to-one connection the interconnection line is unloaded and the characteristic line impedance can be calculated by14 (23) The effective transconductance keff of the interconnection is equal to the sum of the transconductances of the superbuffer stages and is calculated by substitution of n1,N into Eq. ~A7!. Because there is no biasing or termination resistor, there is no steadystate current consumption in this regime of operation and IH 5 IL 5 0. quired to be as long as the superbuffer propagation delay: APPLIED OPTICS y Vol. 37, No. 2 y 10 January 1998 Z1 5 1 , nCint off (26) where n is the propagation speed in the medium and Cint off is the parasitic off-chip line capacitance per unit length. In the case of fan-out the line can be treated as distributed if the spacing between the receivers is short. Under this assumption the total loaded line impedance can be calculated as14 ZN 5 Z1 CN 11 Cint off S D 1y2 , (27) where CN is the fan-out capacitance per unit length that, from Fig. 4, is seen to be CN 5 CrcyLeff. (28) Similarly, the time-of-flight delays for one-to-one and fan-out connections are estimated as tf1 5 Lintyn, S Lint Crc tf 5 11 n Leff Cint off N Lint,N . D 1y2 . (30) trn , 2.5 (31) tr n Crc 2.5 1 1 Leff Cint off S D 1y2 . (32) In the case of series termination the output resistance of the last superbuffer stage in the driver chip should match the characteristic line impedance ~Z1 in the one-to-one case and ZN in the fan-out case!: Rdr1,Nser 5 Z1,N. (33) In the parallel-termination case, although the buffer impedance does not have to match the line impedance, it should be low enough to provide the necessary high-level voltage across the paralleltermination resistor: Rdr1,Npar 5 RT S D S D VDD VDD 2 1 5 Z1,N 2 1 , (34) VH VH where RT is the termination resistor whose value is matched to the line impedance and VH is the minimum acceptable logic high-level voltage. The resistance of the last superbuffer stage is calculated by Eqs. ~33! and ~34!; thus the entire superbuffer can now be designed. The size S of the nth ~the largest! buffer stage is S 5 bn21 5 RminyRdr, where Rmin is the minimum-size inverter output resistance. This allows the calculation of the total number of stages in the superbuffer, n1,Nser,par 5 S Ctotser,par1,N 5 Csbser,par1,N 1 Cpin 1 LintCint off 1 NCrc. (36) (29) Note that, in Eqs. ~27! and ~30!, we have neglected the increasingly smaller effect of the output capacitance of the driver chip as the interconnection length increases. Now we are in a position to estimate the boundary of the lumped-capacitor and the transmission-line regimes. Using Eqs. ~29! and ~30! in inequalities ~13! and ~14! and solving for Lint provides the region of transmission-line operation: Lint,1 . capacitance of the superbuffer Csb is obtained by substitution of Eq. ~35! for n into Eq. ~A6!. As in the lumped-capacitor regime, the total interconnection capacitance is calculated as D 1 Rmin ln , ln b Rdr1,Nser,par (35) and, from the biggest to the smallest, the stages decrease in size by a factor of b. After the number of stages is determined the effective transconductance keff of the line is calculated from Eq. ~A7!. The total In the series-termination case, there is no steadystate current consumption. In the paralleltermination case, there is a steady-state current as long as the logic level of the interconnection is high: IH,par1,N 5 VH , Z1,N IL,ser1,N 5 0, IH,ser1,N 5 0, (37) IL,ser1,N 5 0. (38) Because the interconnection is driven by the superbuffer, the rise time of the data signals in the interconnection is calculated by Eq. ~A4!. The minimum clock period TCLK is equal to the sum of the superbuffer propagation delay given by Eq. ~A1! and the one-way or the round-trip time-of-flight delay for parallel and series termination, respectively: TCLK 5 tsb,p1,N 1 mtf1,N, (39) where m 5 1 for parallel and m 5 2 for series termination. In Eq. ~39! we assumed for simplicity that the last superbuffer stage propagation delay is equal to the propagation delay of a previous stage, although its size is determined by the matching condition rather than by the output capacitance. In Eq. ~39! we also neglected the rise time of the signal at the receiver end, which is quite small because of the small flip-chip bond and receiver input capacitance. In Eq. ~39! the superbuffer propagation delay is different for the series- and parallel-termination cases because of Eq. ~35!. This difference, however, is small because the minimum high-level voltage Vh is generally approximately half the supply level, requiring a buffer size as big as a buffer designed for the series-termination case. For short interconnection lengths estimations of the interconnection speed with the lumped-capacitor or transmission-line models provide close results. For this reason we plot only the speed performance estimated by the transmission-line theory for all interconnection lengths. The technology constants used in the numerical illustrations are presented in Table 1. Figure 5 illustrates the maximum clock speed and energy per bit transmitted as a function of interconnection length for a one-to-one connection. Because of Eqs. ~29! and ~30! the speed decreases with interconnection length, from above 1 GHz to approximately 300 MHz ~series-terminated 20-cmlong line! in the slowest case. The speed of parallelterminated lines is higher, mostly because of the oneway trip delay of the signal, as stated by Eq. ~39!. The strongest dependence of the delay on the interconnection length comes from the time-of-flight delay, which increases linearly with line length. For 10 January 1998 y Vol. 37, No. 2 y APPLIED OPTICS 211 Table 1. VLSI and Electrical Packaging Constants Symbol VDD VT VH Rmin RCmin kmin Cmin,i Cmin,o Cpin, Cbond Lc Lsp v Cint off Rint off Cint on Rint on CminN,i CminN,o Wmin Cff don doff Description Value VLSI power supply voltage level Transistor threshold voltage Minimum acceptable logic high-level voltage Minimum-size transistor average resistance Minimum-size inverter internal propagation delay Minimum-size transistor transconductance parameter Minimum-size inverter input capacitance Minimum-size inverter output capacitance Optimum superbuffer tapering factor Flip-chip bond capacitance ~also used as pin capacitance in the text! Side length of a chip in MCM packaging Spacing between MCM chips Speed of wave propagation on a MCM substrate Off-chip interconnection capacitance per unit length Off-chip interconnection resistance per unit length On-chip interconnection capacitance per unit length On-chip interconnection resistance per unit length Minimum NMOS transistor input capacitance Minimum NMOS transistor output capacitance Off-chip conductor minimum width Off-chip conductor fringing-field capacitance per centimeter Side length of an electronic block ~on a wafer! to which clock is routed Side length of a chip ~on an MCM! to which clock is routed 3.3 V 0.5 V VDDy2 8700 V 100 ps 80 mAyV2 6 fF 6 fF 5 20 fF 1 cm 0.2 cm 15 3 109 cmys 1 pFycm 0.8 Vycm 1.4 pFycm 90 Vycm 3 fF 2 fF 25 mm 1 pFycm 1 cm 1 cm short interconnections, as suggested by inequalities ~31! and ~32!, the line is in the lumped-capacitor regime. In this regime the energy requirement increases linearly with the interconnection length because of the linear dependence of both Ctot and keff on Lint. As interconnections become longer, the transmission-line phenomena become dominant. This yields a sudden increase in energy at the boundary of this regime. This increase is small in the series-termination case, as this scheme requires nothing more than a slight increase of the line-driver size. In the parallel-termination case, however, the jump is drastic, as this scheme requires a parallel- Fig. 5. Speed performance and energy requirements of off-chip electrical interconnections as functions of the interconnection length for serial- and parallel-terminated lines in the case of oneto-one connections. 212 APPLIED OPTICS y Vol. 37, No. 2 y 10 January 1998 termination resistor that consumes high steady-state power. Note that the boundary between lumpedcapacitor and transmission-line regimes is not precisely defined because of the approximate separation of the two regions by inequalities ~13! and ~14!. For short interconnections the biggest contribution to the total energy comes from the termination resistor. As line lengths become longer, the capacitive component quickly becomes the dominant component, constituting up to 70% of the total energy. For short interconnections, the short-circuit component of the energy is approximately 20% of the other components. As line lengths become longer, its effect reduces to approximately 10%. Because the line impedance is independent of line length, after the line impedance is matched ~in the series-termination case! the superbuffer size remains constant as interconnections become longer. However, as the line length increases, the total interconnection capacitance increases linearly, resulting in an overall linear increase of energy. In the paralleltermination case the slope of the energy increase is higher than in the series-termination case. This is because, in addition to the linear increase of the interconnection capacitance with line length, the energy requirement of the parallel-termination resistor also increases with line length. Indeed, longer interconnections reduce the transmission speed of a bit, which results in power dissipation at the termination resistor over a longer period of time. 5. Speed and Energy of On-Chip Electrical Interconnections Figure 6 illustrates a typical on-chip ~wafer! interconnection configuration. In this case the distrib- Fig. 6. Model of an on-chip interconnection. uted line behavior is dominant, as generally an onchip conductor is lossy ~Rint ' 100 Vycm!. Thus for short interconnection lengths a superbuffer is sufficient to drive the interconnection with small delays. As the interconnection length becomes longer, the line resistance becomes nonnegligible. The propagation delay through the interconnection can be minimized only by use of optimally sized and spaced repeaters.14 Because a repeater is large a superbuffer is still needed to drive the first repeater. We consider three different fan-out loading conditions: a minimum inverter input load every 100, 200, and 400 mm. Again, if we assume distributed load behavior the extra loading resulting from fan-out can be absorbed in the parasitic line capacitance. Given the line parasitics per unit length and the minimum inverter parameters, the optimum number NR and size SR of the repeaters can be calculated as14 NR 5 SR 5 F F G 0.4Rint on~Cint on 1 CN! 0.7RminCmin,o G Rmin~Cint on 1 CN! Rint onCmin,o , (44) Substituting Eq. ~44! into Eq. ~A3! for the load capacitance and solving for n with the help of Eq. ~A5! yields the number of superbuffer stages nsb,ON. The total superbuffer capacitance Csb,on and the propagation delay tsbp,on are calculated by Eqs. ~A6! and ~A1!. Generally a two-stage superbuffer is sufficient to drive the repeater. The superbuffer effective transconductance is calculated by Eq. ~A7!. The effective transconductance of the interconnection is then estimated as keff 5 keffsb 1 NRSRkmin. (40) Ctot 5 (41) where Rint on and Cint on are the on-chip interconnection resistance and capacitance per unit length, respectively; Rmin and Cmin,o are the output resistance and the output capacitance of a minimum-size inverter, respectively; and CN is the extra capacitance per unit length owing to fan-out. The resulting propagation delay through the interconnection ~including the delay of the repeaters! is found as14 tp,RP 5 2.5@Rmin~Cint on 1 CN! Rint onCmin,o#1y2. (42) On the basis of typical 0.5-mm technology parameters, there has to be one repeater, 150 times larger than a minimum geometry inverter, approximately every centimeter of the on-chip interconnection length to minimize the propagation delay ~in the oneto-one connection case!. Because the repeater size is much larger than a minimum-size inverter, the first repeater is to be driven by a superbuffer. The input capacitance of a repeater is calculated as CR,in 5 SRCmin,i . Csb,L 5 CR,in 1 Csb,o. (45) The total capacitance of the interconnection is then 1y2 1y2 , The superbuffer load capacitance is equal to the sum of the input capacitance of a repeater given by Eq. ~43! and the output capacitance of the last superbuffer stage: (43) keff ~Cmin,i 1 Cmin,o! 1 Lint~Cint on 1 CN!. (46) kmin Because there is no termination or biasing requirement, there is no steady-state current consumption and IH 5 IL 5 0. Because of the use of a superbuffer cascaded with repeaters in the interconnection, the rise time of the signals varies slightly throughout the interconnection. The biggest contribution to the energy ~for long connections! comes from the repeaters. Thus for simplicity we use the signal rise time at the input of a typical repeater for the entire interconnection. We can estimate this rise time by weighing the distributed RC terms ~between two successive repeaters, i.e., approximately 1 cm! by 1 and lumped ones by 2.3 ~Ref. 14!: F tr 5 Rint onCint on 1 2.3 G Rmin ~Cint on 1 CN 1 CR,in! SR 1 Rint onCR,in . (47) Finally, the minimum clock period is calculated to be TCLK 5 tsbp,on 1 Tp,RP. 10 January 1998 y Vol. 37, No. 2 y APPLIED OPTICS (48) 213 Fig. 7. Speed performance and energy requirement of on-chip electrical interconnections as functions of the interconnection length for different loading conditions ~no load, 50 fFymm, 100 fFymm, and 200 fFymm!: ~a! Speed. ~b! Energy. Unlike in the off-chip interconnection case ~i.e., in the transmission-line regime!, keff is a function of the interconnection length, as longer interconnections involve more repeaters. This results in a faster increase of the energy requirement as a function of Lint. Figure 7 illustrates the results of the analysis. As can be seen in Fig. 7, the effect of loading on the speed is small. The energy requirement increases linearly because of the linear dependence of the number of repeaters as well as that of the line capacitance on Lint. The biggest contribution to the energy comes from the capacitive component, whereas the shortcircuit component constitutes only 25% of the overall energy. The energy requirement of the on-chip interconnection is comparable with that of an off-chip interconnection. Although there is no need for terminations, on-chip interconnects require periodic repeaters because of the lossy nature of the interconnection conductor. For the same reason onchip interconnections are also slower than their offchip counterparts. 6. Optical Interconnections Figure 8 shows the typical optical interconnection model considered in this study. As in the electrical interconnection case, a superbuffer amplifies the minimum logic to drive the optical transmitter driver, which in turn switches the optical transmitter device. The transmitter is modeled by a current source and a capacitor. For a VCSEL the current source models the laser current needed to produce the required laser output power. For a MQW modulator the current models the current arising from absorption. The transmitter capacitance includes the transmitter driver output capacitance, the transmitter device capacitance, and the flip-chip bond capacitance ~based on assumption A9!. The minimum logic that drives the interconnection uses the logic level power supply voltage VDD. Because of the transmitter device requirements, the 214 APPLIED OPTICS y Vol. 37, No. 2 y 10 January 1998 transmitter itself, the transmitter driver, and the superbuffer all use a separate supply level VTR, which is generally larger than VDD. We assume that VTR is less than the breakdown voltage on the chip, such that no special circuitry is needed in these stages. Furthermore, if VTR is only a couple of times larger than VDD, an inverter in the logic is capable of driving a properly biased, higher-voltage inverter ~first inverter in the superbuffer!, as this inverter would provide the necessary gain. Note that the first inverter of the superbuffer never turns off completely because of the less-than-full voltage swing at its input, resulting in steady-state power dissipation. Alternatively the superbuffer could be operated at the VLSI supply level VDD and the amplification could be performed by the transmitter driver. In this late-amplification scheme, the superbuffer consumes less energy because of the reduced supply level, but the transmitter driver consumes more energy because the transmitter driver does not switch off completely. As the transmitter driver is generally larger than minimum, the energy requirement is higher than that of the first superbuffer stage. In the early-amplification case, in which the amplification is performed by the first inverter of the superbuffer, the superbuffer consumes more and the transmitter driver consumes less energy. A detailed analysis shows that, for transmitter voltages below 10 V, the early-amplification scheme is more energy efficient. In a dynamic design or in cases in which the transmitter driver does not have to be large, late amplification would be more beneficial. In some cases the transmitter may need to be biased at a certain voltage for optimum operation: A separate supply voltage Vb is used for this purpose ~see Fig. 8!. If a modulator is used as transmitter the optical power needs to be routed to each modulator from an optical power supply, as is also depicted in Fig. 8. The free-space optical interconnections route a Fig. 8. Model of interconnection by use of free-space optics. On the transmitter site the interconnection includes a transmitter, a transmitter driver, and a superbuffer in the case for which the transmitter is too large to be driven by minimum logic. The receiver site includes a photodiode with a thresholding current source, clamping diodes to limit the voltage swing, and a minimum-size inverter to amplify the photodiode output signal and restore the logic levels. transmitter output to N receivers. They are modeled by a time-of-flight delay and a power transfer efficiency ~see Table 2!. At the receiver site a current source models the absorbed photocurrent in a reverse-biased p–n junction. The current source IL models the photodiode load current. Two clamping diodes are used to limit the photodiode output voltage to approximately 330 mV ~on the basis of assumption A11! and obtain fast switching. A minimum-size inverter is used to threshold the photodiode output and restore the logic levels. Independently of the type of transmitters used the design of the interconnection always starts with the estimation of the required photocurrent dynamic range at the receivers. This photocurrent depends on the technology parameters, the operation speed, and the photodiode voltage swing. After the detec- tor photocurrent dynamic range is determined, the transmitter output power dynamic range can be estimated on the basis of the detector responsivity and the optical link power transfer efficiency. This is followed by the design of the particular transmitter used. A. Multiple-Quantum-Well Modulator as a Transmitter For optical interconnection applications optical intensity modulation can be achieved directly in a MQW structure by electrical modulation of the excitonic absorption. This effect is commonly referred to as the quantum-confined Stark effect.22,23 The MQW modulators are potentially high-speed devices, and their use in free-space optoelectronic interconnection systems has been demonstrated.24 The modulator contrast ratio, insertion loss, and absorption satura- Table 2. Optical Routing and Power Supply Constants Symbol Description Value hOSR LAS hOSR mod hdis hL,sys VCSEL-based optical interconnection power routing efficiency Modulator-based optical interconnection power routing efficiency Optical power distribution efficiency System laser, current-to-optical power conversion efficiency 0.7 0.5 0.9 0.3 WyA 10 January 1998 y Vol. 37, No. 2 y APPLIED OPTICS 215 Table 3. Photodetector Constants Symbol DVd Aph Adiode Cph Cclamp Rph tp,ph Description Value Photodiode output voltage swing Photodiode area Clamping diode area Photodiode device capacitance per unit area Clamping diode capacitance per unit area Photodiode responsivity Photodiode internal propagation delay 330 mV 50 mm2 10 mm2 0.2 fFymm2 tance. The effective interconnection transconductance is keff 5 keffsb 1 kdr, where kdr is the transmitter driver transconductance given in Appendix E. The high- and low-level steady-state currents are calculated as 0.2 fFymm2 0.3 AyW 100 ps tion are important issues that constrain the applications of these modulators.25 It has been observed experimentally that the contrast ratio and the insertion loss of a MQW modulator saturate at high optical intensities.26,27 This saturation, which has been attributed to carrier screening in the material, has been analyzed in the past.28 We size the MQW devices appropriately as a function of the optical power requirement and thus neglect the saturation effects. 1. Analysis of a Multiple-Quantum-Well Modulator Interconnect In the case of an optical interconnection that uses MQW modulators the driver should be designed to take into account the absorbed modulator current. In addition, for MQW’s the modulator size is a function of the input optical power to avoid the saturation phenomena. Higher speed requires more modulator power, thus resulting in a larger modulator with larger capacitance and current, which affects the design of the driver. The transmitter itself also contributes to the total steady-state current because of absorption. Appendix E presents the details of the MQW modulator driver design. On the basis of the input capacitance of the driver given in Eq. ~E12!, a superbuffer can be designed as above if needed. The total capacitance Ctot is then Ctot 5 ~Csb 1 CTR,i 1 CTR! ATR 1 NCrc, (49) where Csb is the superbuffer total parasitic capacitance; CTR,i and CTR are the transmitter driver input and transmitter capacitances given by Eqs. ~E12! and ~E10!, respectively; and Crc is the receiver capaci- (50) IH 5 ITR 1 IMQW,H 1 ILoad 1 IRC, (51) IL 5 ITR 1 Iph,L 1 IRC, (52) where ITR and IRC are the transmitter and receiver site steady-state currents resulting from amplification, respectively ~estimated in Appendix C!; IMQW,H is the MQW driver high-level current given by Eq. ~E6!; and the photodetector currents are the same as those given in Section 5. The minimum clock period of the interconnection can be estimated as T 5 tsb,p 1 tdr,p 1 tp,MQW 1 tfopt 1 tp,ph 1 tp,det 1 RCmin, (53) where tp,MQW is the MQW modulator internal propagation delay and tdr,p is the modulator driver propagation delay, which can be approximated as half the signal rise time tr given by Eq. ~E9!. There is an extra energy requirement for an external system laser to drive the modulators optically. The electrical power requirement in the system laser derived from a single MQW modulator is calculated by Popt, in 5 DRopt . hMQW,H 2 hMQW,L The technology parameters used for numerical illustrations in the next subsections are presented in Tables 3 and 4. 2. Comparison with Off-Chip Electrical Interconnects Figure 9 illustrates the comparison between MQWbased optical and off-chip electrical one-to-one interconnections. The optical interconnection speed is controlled by variation of the speed of light detection. In the fastest case the photodetection delay tp,det is equal to the minimum inverter propagation delay RCmin. When this delay is increased the energy re- Table 4. MQW Modulator Technology Constants Symbol VTR MQW rMQW VL IS~Vm! IS~0! Km k~0! CMQW tp,MQW 216 Description MQW MQW MQW MQW MQW MQW MQW MQW MQW modulator modulator modulator modulator modulator modulator modulator modulator modulator supply voltage responsivity low-level logic voltage saturation intensity at the modulation voltage saturation intensity at zero voltage absorption slope ratio absorption slope at zero voltage device capacitance per unit area internal propagation delay APPLIED OPTICS y Vol. 37, No. 2 y 10 January 1998 (54) Value 10 V 0.53 AyW 0.5 V 800 Wycm2 244 Wycm2 4 0.2 0.12 fFymm2 30 ps Fig. 9. Speed and energy comparisons between off-chip electrical and MQW-based optical interconnects for one-to-one connections: ~a! Speed. ~b! Total system energy requirement. ~c! Processing plane energy requirement. Curves related to the electrical interconnect are illustrated with symbols. The performance of optical interconnects is illustrated at different detection speeds ~tp,det! to permit comparison with electrical interconnects running at the same speed. quirement of the optical interconnection is reduced at the expense of speed. The behavior of the curves shows a better optical speed performance that is due to the smaller buffer requirement of the MQW modulators because of their smaller device capacitance. The break-even line length for equal energy requirement is approximately 3 cm compared with parallelor series-terminated electrical interconnects. For both system energy and processing plane energy ~which is commonly referred to as on-chip power dissipation! requirements, MQW-based optical interconnects offer a simultaneous speed– energy advantage for one-to-one connection. For long interconnects the MQW optical interconnects require almost 6 times less processing plane energy and yet operate faster than the fastest electrical interconnect. In terms of overall system energy the energy efficiency of the optical interconnects drops to less than twice better than their electrical counterparts. Note that, when the optical light detectors are operated 10 times slower than the speed of the minimum inverter, optical interconnects could be more energy efficient than even a series-terminated electrical interconnect for long interconnections and yet operate faster. The biggest contribution to the system energy requirement comes from the external light source. The biggest contribution to the processing plane energy requirement comes from the steady-state modulator currents because of absorption. 3. Comparison with On-Chip Electrical Interconnects Figure 10 illustrates the comparison between MQWbased optical and on-chip electrical one-to-one interconnections. There is a break-even line length of approximately 4 cm for equal system energy. It is 10 January 1998 y Vol. 37, No. 2 y APPLIED OPTICS 217 Fig. 10. Speed and energy comparison between wafer-scale electrical and MQW-based optical interconnects for a one-to-one connection: ~a! Speed. ~b! Total system energy. ~c! Processing plane energy. Curves related to electrical interconnect are illustrated with symbols. The performance of the optical interconnects is illustrated at different detection speeds ~tp,det! to permit comparison with electrical interconnects running at the same speed. interesting to note that faster optical interconnects require less on-chip energy per bit. This is due to the smaller contribution of the steady-state components. In this case the break-even line length is only approximately 2 cm. For long wafer-scale interconnects the simultaneous speed– energy advantage of the MQW-based optical interconnect exceeds an order of magnitude. B. Vertical-Cavity Surface-Emitting Laser as a Transmitter Compared with light modulators the use of surfaceemitting lasers as optical transmitters in a free-space optical interconnection system significantly simplifies the optical system design by elimination of the requirement for an external laser source and its associated optics. Thus surface-emitting lasers have the potential for improving the optical link efficiency 218 APPLIED OPTICS y Vol. 37, No. 2 y 10 January 1998 and the system stability and robustness. Although still in the early development stages, VCSEL’s are very promising for two-dimensional array applications.29,30 A great amount of research has been performed lately to improve the uniformity of laser arrays, reduce the threshold currents, and increase the maximum power output.29,31,32 The threshold voltage and electrical-to-optical power conversion efficiency are the main characteristics of lasers, along with their series resistance, that influence the performance of these devices in a digital interconnection.33,34 1. Analysis of a Vertical-Cavity Surface-Emitting Laser Interconnect The details of the VCSEL transmitter driver design are presented in Appendix F. Based on the input capacitance of the driver given in Eq. ~F12!, a super- Table 5. VCSEL Constants Symbol VTR LAS f g hLI Vth tp,VCSEL CLAS Description VCSEL VCSEL VCSEL VCSEL VCSEL VCSEL VCSEL Value supply voltage slope of threshold current–laser diameter characteristic slope of output power–laser diameter characteristic slope of output power–laser current characteristic threshold voltage internal propagation delay device capacitance per unit area buffer can be designed as above if needed. The total capacitance Ctot is then Ctot,VCSEL 5 ~Csb 1 CTR,i 1 CTR! ATR 1 NCrc, (55) where Csb is the superbuffer total parasitic capacitance; CTR,i and CTR are the transmitter driver input and transmitter capacitances given by Eqs. ~F12! and ~F10!, respectively; and Crc is the receiver capacitance. The effective interconnection transconductance is keff 5 keffsb 1 kn, (56) where kn is the transmitter driver transconductance given by Eq. ~F8!. The high- and the low-level steady-state currents are calculated as IH 5 ITR 1 IVCSEL,H 1 ILoad 1 IRC, (57) IL 5 ITR 1 Ith 1 Iph,L 1 IRC, (58) where the VCSEL transmitter high-level current is given by Eq. ~F7!, the VCSEL threshold current Ith is given by Eq. ~F6!, and the receiver currents are the same as those in Subsection 6.A. The minimum 10 V 0.7 mAymm 0.5 mWymm 0.3 WyA 2V 30 ps 0.2 fFymm2 clock period of the interconnection can be estimated as T 5 tsb,p 1 tdr,p 1 tp,VCSEL 1 tfopt 1 tp,ph 1 tp,det 1 RCmin, (59) where tp,VCSEL is the VCSEL device propagation delay and tdr is the laser driver propagation delay that can be approximated as half the signal rise time tr, given by Eq. ~F9!. Unlike in the modulator cases, there is no need for an external light source as each VCSEL is a light source itself. VCSEL parameters are shown in Table 5. 2. Comparison with Off-Chip Electrical Interconnects Figure 11 illustrates the comparison between VCSEL-based and off-chip one-to-one interconnections. The speed of the VCSEL-based interconnection is slightly lower than in the MQW case because of the larger driver requirement of the VCSEL because of the high laser current. The break-even line lengths for equal energy is approximately 1 cm. For 20-cm-long interconnects a VCSEL-based interconnect requires an order of magnitude less system en- Fig. 11. Speed and energy comparison between off-chip electrical and VCSEL-based optical interconnects for a one-to-one connection: ~a! Speed. ~b! System ~processing plane! energy. Curves related to electrical interconnect are illustrated with symbols. The performance of the optical interconnects is illustrated at different detection speeds ~tp,det! to permit comparison with electrical interconnects running at the same speed. 10 January 1998 y Vol. 37, No. 2 y APPLIED OPTICS 219 Fig. 12. Speed and energy comparison between wafer-scale electrical and VCSEL-based optical interconnects for a one-to-one connection: ~a! Speed. ~b! System ~processing plane! energy. Curves related to electrical interconnect are illustrated with symbols. The performance of the optical interconnects is illustrated at different detection speeds ~tp,det! to permit comparison with electrical interconnects running at the same speed. ergy and operates 2 to 4 times faster. Note that, with improvements in the VCSEL technology, the numbers reported in this paper are expected to improve further.35 Also note that, compared with the MQW modulator cases, the fastest VCSEL-based optical interconnect requires less system energy. 3. Comparison with On-Chip Electrical Interconnects It can be observed from Fig. 12 that the break-even line length is less than a centimeter. Compared with the wafer-scale electrical connections VCSELbased optical interconnects yield a drastic speed and energy advantage: In the case of long interconnects an optical interconnect is 4 times faster and yet 10 times more energy efficient. Note that, even for very short interconnects, optics still provides faster operation. 7. Effects of Technology Scaling This section covers the potential effects of VLSI scaling ~i.e., reduced minimum feature size! on the different interconnect technologies considered in this paper. This is done from only a qualitative point of view and within some simplifying assumptions on the effect that the scaling has on such parameters as voltage supplies, line capacitance, line impedance, etc. However, the trends shown here should hold accurate for the next two or three generations of VLSI circuits. Note that a related study, although quite more in depth, can be found in Ref. 36. From a general point of view, scaling down the VLSI technology usually yields faster circuits that consume less power. It also implies reduced parasitics as the transistor areas become smaller. However, it also yields increased resistance of the metal wires on a VLSI chip. 220 APPLIED OPTICS y Vol. 37, No. 2 y 10 January 1998 A. Electrical Interconnects Assume that two VLSI chips are interconnected by means of an off-chip interconnection line of impedance Z0. As the technology scales down ~assuming ideal scaling!, the transistor resistance remains constant to a first order. This is because the transconductance increases linearly with the scaling parameter s and the supply voltage VDD decreases linearly with s. Note that these assumptions are true only in first approximation and tend to break down as the scaling approaches the deep submicrometer level ~0.18 mm and below!. Thus the minimum gate delay decreases with s because the resistance remains constant and the parasitic capacitances decrease with s. Therefore the interconnect technology scaling allows the off-chip interconnection linewidth to be decreased to a first order by s. If this is done along with VLSI scaling, then the line impedance Z0 increases by s. An increased line impedance makes a line easier to drive, but, because the number of stages in a superbuffer is a logarithmic function of the output load, the effect of increased line impedance on the superbuffer driver performance is minimal. However, because the minimum gate delay scales down as s, the overall superbuffer propagation delay decreases as s. Scaling Z0 up, however, may increase the propagation delay through the line if the line connects to large lumped capacitances, as the charging capability of the line is proportional to its impedance. If the line is practically unloaded then the propagation delay is equal to the inherent time of flight, which is independent of the line impedance. Therefore, to a firstorder approximation, we can say that technology scaling s decreases the interconnect propagation delay by s if the line is unloaded, whereas the propaga- tion delay remains roughly constant ~or decreases by less than s! if the line is loaded. On the other hand, the energy of data transmission decreases significantly with scaling. The superbuffer and the interconnect capacitances decrease as s. Because the supply voltage also decreases by s and because the dominant capacitive component of the energy is proportional to the square of the supply voltage ~capacitive component!, the energy requirement scales down as s3 ~note that the short-circuit component scales as s3 and the parallel-termination component scales between s and s3!. In the case of on-chip interconnects the scaling of a repeater-based interconnect needs to be considered. When an interconnect is scaled down the line capacitance decreases in the first order, while the line resistance increases. Thus the RC time constant of the line remains approximately constant ~unless other measures are taken to keep the resistance low!. The repeater propagation delay, however, scales down with s because of the decreased gate capacitances. The overall interconnection propagation delay, which is a function of both the line RC delay and the repeater propagation delay, decreases as only s0.5. If, while the linewidth is scaled to reduce the capacitance the line thickness is increased to keep the line resistance constant, then the propagation delay decreases with s. This approach, however, is in contradiction to the VLSI trend to increase the number of interconnect layers. Increased line thickness also increases line-to-line and fringe capacitances and results in a decrease in the line capacitance with a linewidth that is less than s. As in the off-chip interconnect case, the energy requirement of on-chip interconnects scales down as s3 because of the decrease in both supply voltage and capacitances. Therefore the energy– delay product of electrical data transmission scales down between s3 and s4. B. Optical Interconnects The energy requirement of an optical interconnect is dominated mostly by static currents. As the VLSI technology scales down, the parasitic capacitance driven by a photodiode decreases by s. The gain of the amplifier following the photodiode increases with s because of the increased transconductance and the reduced current. Because the supply voltage also reduces by s, the voltage swing requirement at the output of the photodiode reduces as s2. Therefore, for the same speed the required photocurrent reduces as s3 because of the reduced capacitances along with a reduced voltage swing. Similarly to the photocurrent, all the transmitter optical power and currents scale as s3. Because the voltage supply reduces by s, the energy requirement of the optical interconnect scales down by s4 ~which is better than the electrical interconnect case! without assuming any improvement in the efficiency of transmitters and interconnect optics. Because we assumed constant speed, the energy– delay product scales as s4 ~which is as good as the best electrical case!. The results of com- Table 6. Effect of Scaling the VLSI on the Various Interconnection Technologies Interconnect Energy–Delay Product Improvement Off chip On chip Optical s3 ,s4 s4 parison of the scaling effects are summarized in Table 6. 8. Conclusions In this paper we have compared electrical and freespace optical interconnections in terms of speed performance and energy cost for digital transmission in large-scale systems. Free-space optical interconnects that use MQW modulators or VCSEL’s as transmitters offer a significant speed advantage over both off-chip and wafer-scale on-chip electrical interconnects. Compared with the fastest off-chip electrical interconnects ~parallel-terminated lines! and for lengths up to a few centimeters, optical interconnects provide as much as a twice better speed performance. For medium-length off-chip interconnections ~5 to 15 cm! the speed advantage of the optical interconnect is approximately 50%. Finally, when compared with long off-chip interconnections the speed advantage reduces to approximately 20%– 40%. Compared with series-terminated offchip electrical interconnects that require much less power than their parallel-terminated counterparts optical interconnects provide at least a twice better transmission speed, even for very long interconnects ~up to 20 cm!. A MQW-based optical interconnect provides the best one-to-one speed performance, while VCSEL-based interconnects follow with small differences. This is due to the smaller driver requirement of the MQW modulators owing to their much smaller transmitter current requirement compared with that of the VCSEL. Compared with wafer-scale VLSI connections optical interconnects provide increasingly better speed performance as the line length becomes longer, reaching a 4 times faster transmission speed. In the case of long on-chip connections up to 2 cm in length, an optical interconnect provides a twice faster interconnect speed. The on-chip energy requirements of one-to-one optical interconnects are generally less than 50 pJybit transmitted. For electrical interconnects it is of the order of several hundred picojoules. The break-even line length between optical and electrical interconnects for equal energy is of the order of a few centimeters. For applications in which a large processing plane is needed optical interconnects provide a simultaneous speed and energy advantage over electrical interconnects, reaching a combined factor of at least 20. Modulator-based optical interconnects require more overall system energy than VCSEL-based opti10 January 1998 y Vol. 37, No. 2 y APPLIED OPTICS 221 where b is the tapering factor, p5 Fig. 13. n-stage superbuffer used to drive large capacitive loads. cal interconnects because of the requirement for an external system light power source. VCSEL-based optical interconnects offer the best energy requirement but also require higher on-chip energy dissipation. Our analysis in this paper has not included the area requirements of the devices used in the interconnection. Our main emphasis in this study has been to design interconnections that operate at the fastest possible speed and then to compare the energy requirements. This approach yields a large-area requirement for the electrical interconnects compared with that of the optical ones. In the off-chip electrical interconnect case the line drivers and the line terminators and, in the wafer-scale connection case, the repeaters occupy a substantial amount of the onchip area. For a comparable area there is enough room in the optical interconnect case, for example, to improve the light detector circuitry and reduce the input optical power requirement,36,37 which considerably reduces the overall energy requirement of an optical interconnect. Although more complicated detector circuitry may increase the detector propagation delay, the overall link delay may benefit, as this may decrease the transmitter propagation delay because of reduced transmitter power requirements.35 Also, in some systems slower-than-maximum-speed optical interconnects can be designed to provide a much lower energy requirement than that of their electrical counterparts while still operating faster. In such cases there is room for an increased detector delay to obtain even better energy efficiency. Improved light detection, possibly with more complicated circuitry, is therefore an important issue that should be addressed in the future.35 Appendix A: Superbuffer Design A superbuffer, i.e., a chain of inverters of increasing sizes, is used widely to reduce propagation delays when driving large capacitive loads. In this appendix we use some of the results of a superbuffer design reported in Ref. 13. A superbuffer circuit with a tapering factor b is illustrated in Fig. 13. The minimized propagation delay through the superbuffer is given as13 tsb,p 5 nRCmina, (A1) where n is the number of inverter stages, RCmin is the minimum logic RC time constant, and a is defined as a ; 1 1 p~b 2 1!, 222 (A2) APPLIED OPTICS y Vol. 37, No. 2 y 10 January 1998 Cmin,o , Cmin,i 1 Cmin,o Cmin,i is the minimum inverter input capacitance, and Cmin,o is the minimum inverter output capacitance. The number of stages n ~excluding the minimum first stage! can be calculated as13 n5 S D 1 CL ln 2 1, ln b Cmin,i (A3) where CL is the output load capacitance of the last superbuffer stage. Because the signals within the superbuffer can be treated with the lumped-capacitor approximation, the rise time tr of the signals in the superbuffer is approximately twice the propagation delay of a single stage: tr 5 2RCmina. (A4) The output capacitance of the last superbuffer stage, which is bn21 times larger than the minimum, is calculated as Csb,o 5 bn21Cmin,o. (A5) The total capacitance of the superbuffer on the signal path is the sum of the input and the output capacitances of all the stages: n21 Csb 5 ~Cmin,i 1 Cmin,o! (b i i51 5 ~Cmin,i 1 Cmin,o!b bn21 2 1 . b21 (A6) The effective transconductance of the superbuffer, which we define as the sum of the transconductances of all the stages, is calculated as n21 keff 5 kmin ( i51 bi 5 kminb bn21 2 1 , b21 (A7) where kmin is the transconductance parameter of the minimum geometry inverter. Appendix B: Detector Requirements In this appendix we express the detector photocurrent dynamic range requirement as a function of detector parasitics and speed of operation. Figure 8 illustrates the detector circuit considered. Figure 14 shows the presumed detector transfer characteristic in which the optical input transition between a lowand a high-intensity level is linear, thus resulting in a linear photocurrent signal transition through the photodiode. If the optical interconnection system does not alter the timing characteristics and the photodiode device limits are not pushed, the photocurrent rise time is equal to the rise time of the optical signal at the transmitter output. This is illustrated in Fig. 14, Fig. 15. Plot of Eqs. ~B2! and ~B4!. and using the definitions of rise times @in Fig. 14~a!# yields Iph,H 2 IL 5 2¹VdCdet , tr,TR tr,det 2 4 tr,det $ tr,TR . 2 (B2) J (B3) Similarly, for tr,det , tr,TRy2, Fig. 14. Photodetector input– output waveforms used in the calculations: ~a! the transmitter rise time is less than twice the detector rise time: tr,TR $ tr,dety2. ~b! The transmitter rise time is more than twice the detector rise time: tr,TR , tr,dety2. where tr,TR denotes the transmitter ~or photocurrent! 0%–100% rise time. Figure 14 illustrates the detector operation in which the photodiode output voltage rise time tr,det is larger than or equal to half the transmitter rise time. Applying the capacitance charging equation to the photodiode output node yields * VDD ¹Vd 2 2 2 VDD ¹Vd 2 2 2 1 dV 5 Cdet 1 * H* t3 t2 t1 @i1~t! 2 IL#dt J @i2~t! 2 IL#dt , t2 (B1) where i1~t! 5 IL 1 t 2 t1 ~Iph,H 2 IL!, t2 2 t1 i2~t! 5 ~Iph,H 2 IL!, and ¹Vd is the photodiode output voltage swing, Cdet is the total photodiode output capacitance, and IL is the photodiode load current. Integrating Eq. ~B1! * VDD ¹Vd 2 2 2 VDD ¹Vd 1 2 2 dV 5 1 Cdet H* t2 @i3~t! 2 IL#dt , t1 where i3~t! 5 IL 1 t 2 t1 ~Iph,H 2 IL!. t3 2 t1 Integrating with the help of Fig. 14~b! results in Iph,H 2 Iph,L 5 2¹VdCdet tr,TR , tr,det2 tr,det , tr,TR . (B4) 2 In Fig. 15, we plotted Eqs. ~B2! and ~B4!. As can be observed from Fig. 15, operating the detector at a shorter rise time than that of the transmitter requires an increasingly large optical power dynamic range, whereas operating the detector more slowly than the transmitter results in only a small power saving. Therefore operating at equal transmitter and detector rise times is nearly optimal in terms of speed and energy. This is the region of operation defined by Eq. ~B2!. Also plotted in Fig. 15 is Eq. ~B2! when tr,TR is neglected. We observe that in the region of interest of operation, neglecting the rise time of the input signal to the detector causes only a small error while simplifying the calculations. Therefore, for practical purposes, we can assume that the photocurrent dynamic range DRI can be represented by DRI 5 Iph,H 2 Iph,L 5 2¹VdCdet , tr,det 10 January 1998 y Vol. 37, No. 2 y APPLIED OPTICS (B5) 223 and the photodiode propagation delay can be approximated as tp,det < tr,det . 2 (B6) Appendix C: Transmitter and Receiver Steady-State Currents Resulting from Amplification As illustrated in Fig. 8, the first inverter of the superbuffer at the transmitter site has to amplify its input signal ~driven by a minimum logic gate! from the VLSI supply level to the transmitter supply level. This is generally approximately 10 V for achieving an acceptable optical modulation depth. Since this value is only a few times larger than the VLSI supply levels ~3–5 V!, a single inverter is sufficient to perform the amplification. Similarly, at the receiver site, the thresholding inverter consumes steady-state currents because of the amplification of the photodiode output voltage, whose swing is limited to approximately 330 mV by the clamping diodes. In any case, a steady-state current results from the fact that the input voltage swing of an inverter is limited to approximately half of the supply level. If the input voltage swings by an amount of approximately DV of the midsupply level, then it is possible to show that the average steady-state current through the receiver site inverter is S D Fig. 16. Geometric assumptions of the optical interconnect scheme. The size of the inverter depends on the MQW current because of optical absorption, device, integration parasitics, and speed of operation. The MQW modulator input-to-output optical power efficiency ~for high- and low-voltage states! is given as33 2 kmin VDD 2 ¹V 2 VT . 2 2 (C1) hMQW,H 5 For the receiver site, where we assumed a 330-mV voltage swing, DV 5 DVdy2 5 165 mV. For the transmitter site we replace VDD in Eq. ~C1! with VTR: hMQW,L 5 IRC 5 ITR 5 S D 2 kmin VTR 2 ¹V 2 VT , 2 2 (C2) where DV 5 VDDy2. Appendix D: Estimation of Optical Time-of-Flight Delay We assume that two points on a plane separated by Lint can be interconnected optically by an optical element located at a distance Lint from the plane in the third dimension ~see Fig. 16!. Thus the length of the optical path between the two points is calculated as F Lopt 5 2 Lint2 1 S DG Lint 2 k~0! Km , Pi,MQW 11 AMQWIS~VH! k~0! , Pi,MQW 11 AMQWIS~0! VMQW 5 VH, VMQW 5 VL 5 0, (E2) where VH and VL are the high- and the low-level voltages across the modulator, k~0! is the absorption slope at zero voltage, Km is the absorption slope ratio, Pi,MQW is the input optical power to the modulator, AMQW is the modulator area, and IS~V! is the saturation intensity as a function of the modulator voltage. To make the efficiency independent of the input op- 2 1y2 5 2.2Lint , (D1) which results in an optical time-of-flight delay of tfopt 5 Lopt , c (D2) where c is the speed of light propagation in the medium. Appendix E: Multiple-Quantum-Well Modulator Driver Design In this appendix we design an inverter stage to drive the MQW light modulator, as illustrated in Fig. 17. 224 APPLIED OPTICS y Vol. 37, No. 2 y 10 January 1998 (E1) Fig. 17. CMOS MQW modulator driver circuit. tical power or modulator area, we make the following practical assumption33: Pi,MQW 5 0.2. AMQWIS~VH! (E3) Note that reducing ratio ~E3! further results in a larger modulator area for a given power with only a small gain in efficiency. Therefore when the output power requirement from the modulator increases the area of the modulator also increases, according to Eq. ~E3!, to keep the modulator efficiency constant. Using Eq. ~E3! in Eqs. ~E1! and ~E2! yields hMQW,H 5 0.83k~0! Km, (E4) hMQW,L 5 0.9k~0!. (E5) On the other hand, the modulator high-level absorption current can be estimated as IMQW,H 5 rMQWPi,MQWhMQW,H, (E6) where rMQW is the modulator responsivity. The NMOS transistor of the driver inverter should be large enough to sink this current to satisfy the dc condition VMQW $ VH. Equating Eq. ~E6! to the NMOS current in the linear region and solving for the NMOS transconductance yields kndc 5 IMQW,H , ~VTR 2 VT!Vdslow 2 0.5Vdslow2 (E7) where VTR is the transmitter power supply voltage, VT is the transistor threshold voltage, and Vdslow 5 VTR 2 VH. In addition to the dc requirement the driver should also satisfy the switching speed or ac requirement. If the NMOS transistor is in the linear region of operation during switching, its resistance can be estimated as RNMOS 5 2 kn ~VTR 2 VT!2 ac , (E8) where we included a factor of 2 to better approximate the transistor resistance because, during switching, the average value of the input voltage is less than VTR and the PMOS current confronts the NMOS current. The rise time of the signal at the driver output can then be estimated as tr,dr 5 2.3RNMOSCTR, (E9) where CTR is the total transmitter capacitance composed of CTR 5 Cbond 1 AMQWCMQW 1 knac Cmin,o, kmin (E10) where Cbond is the flip-chip bond capacitance, CMQW is the modulator capacitance per unit area, Cmin,o is the minimum inverter output capacitance, and kmin is the minimum transistor transconductance parameter. It is not meaningful to operate the driver Fig. 18. CMOS VCSEL driver circuit. faster than a minimum inverter stage. Thus equating Eq. ~E9! to the rise time of the signals at the output of a minimum inverter ~driving another minimum inverter!, using Eqs. ~E8! and ~E10! in Eq. ~E9!, and solving for the transistor transconductance give knac 5 Cbond 1 AMQWCMQW . Cmin,o 0.5RCmin~VTR 2 VT! 2 kmin (E11) The NMOS transistor is therefore sized on the basis of the stricter of the dc and the ac conditions given by Eqs. ~E7! and ~E11!. For completing the driver design the PMOS transistor is sized twice as large as the NMOS transistor to compensate for its lower mobility. The design of the driver gives the transmitter driver input capacitance as CTR,i 5 kdr Cmin,i, kmin (E12) where kdr is the larger of the dc and the ac transconductances and Cmin,i is the minimum-size inverter input capacitance. Appendix F: Vertical-Cavity Self-Emitting Laser Driver Design The laser driver circuitry considered is shown in Fig. 18, where a single NMOS transistor is used to switch the laser diode. The driver should satisfy the dc and the minimum switching speed requirements: The NMOS transistor should be large enough to conduct the high-level laser current and switch the laser in no more than one minimum inverter propagation delay. In this analysis we assume that the laser is operated at the maximum output power for a certain laser diameter. Therefore the laser diameter is increased to achieve higher optical output power from the laser as needed to accommodate larger fan-out or faster operation. We assume that the output power versus the laser diameter and the threshold current versus the laser diameter characteristics are linear: POH 5 gD, (F1) Ith 5 fD, (F2) 10 January 1998 y Vol. 37, No. 2 y APPLIED OPTICS 225 where POH is the optical output high power level, Ith is the laser threshold current, D is the laser diameter, and g and f are the slopes of the characteristic curves. The laser current-to-power characteristic is assumed to be POH 5 ~ITR 2 Ith!hLI, ITR $ Ith, (F3) POH 5 0, ITR , Ith, (F4) where hLI is the slope of the laser L–I curve. For a given laser output power requirement, the laser diameter, threshold current, and operating dc current are found by use of Eqs. ~F1!–~F4!, as follows: POH , g (F5) Ith 5 f POH, g (F6) S D 1 f 1 . hLI g (F7) The NMOS transistor should satisfy the dc condition IDS 5 ITR. Equating Eq. ~F7! to the linear region of the transistor current, neglecting the small quadratic term, and solving for the transistor transconductance yield kn 5 ITR , ~VTR LAS 2 VT!VL (F8) where VL is the NMOS drain voltage when it is on. Because of the low efficiency of the optical link the resulting laser current is high. For this reason we assume that the design of the switch transistor based on Eq. ~F8! also satisfies the minimum switching speed requirement. If the transient capacitive current is approximately one fourth the high-level laser current ITR, then the rise time of the signal at the driver output can be estimated as tr 5 4CTR LAS VH 2 VL , ITR (F9) where VH is the high-level driver output voltage, VH 5 VTR 2 Vth; and CTR is the laser driver output capacitance composed of the flip-chip bond capacitance, the NMOS transistor junction capacitance, and the laser device capacitance: CTR LAS 5 Cbond 1 kn CminN,i 1 ALASCLAS, (F10) kmin where CminN,i and kmin are the minimum NMOS transistor capacitance and transconductance, respectively; CLAS is the laser device capacitance per unit area; and ALAS is the laser area: ALAS 5 pD. 226 CTR,i 5 kn CminN,i. kmin (F12) The authors thank Volkan Ozguz, Chi Fan, Barmak Mansoorian, Ashok Krishnamoorthy, Gary Marsden, Osman Kibar, Daniel Van Blerkom, David Coudert, and Mark Hansen for many fruitful and enlightening discussions. The useful comments and suggestions of the referees for this paper are also acknowledged. References D5 ITR 5 POH Finally, the laser driver input capacitance can be calculated as (F11) APPLIED OPTICS y Vol. 37, No. 2 y 10 January 1998 1. J. W. Goodman, F. I. Leonberger, S. Y. Kung, and R. A. Athale, “Optical interconnections for VLSI systems,” Proc. IEEE 72, 850 – 866 ~1984!. 2. L. A. Bergman, W. H. Wu, A. R. Johnston, and R. Nixon, “Holographic optical interconnects in VLSI,” Opt. Eng. 25, 1109 –1118 ~1986!. 3. W. H. Wu, L. A. Bergman, A. R. Johnston, and C. C. Guest, “Implementation of optical interconnections for VLSI,” IEEE Trans. Electron. Devices 34, 706 –714 ~1987!. 4. R. K. Kostuk, J. W. Goodman, and L. Hesselink, “Optical imaging applied to microelectric chip-to-chip interconnections,” Appl. Opt. 24, 2851–2858 ~1985!. 5. F. B. McCormick, “Free-space interconnection techniques,” in Photonics in Switching, J. E. Midwinter, ed. ~Academic, New York, 1993!, Vol. II, pp. 169 –250. 6. F. Kiamilev, P. Marchand, A. Krishnamoorthy, S. Esener, and S. H. Lee, “Performance comparison between optoelectronic and VLSI multistage interconnection networks,” Lightwave Technol. 9, 1674 –1692 ~1991!. 7. A. Krishnamoorthy, P. Marchand, F. Kiamilev, K. S. Urquhart, S. Esener, and S. H. Lee, “Grain-size study for a 2-D shuffleexchange optoelectronic multistage interconnection network,” Appl. Opt. 31, 5480 –5507 ~1992!. 8. K. Urquhart, P. Marchand, Y. Fainman, and S. H. Lee, “Diffractive optics applied to free-space optical interconnects,” Appl. Opt. 33, 3670 –3682 ~1994!. 9. M. R. Feldman, S. C. Esener, C. C. Guest, and S. H. Lee, “Comparison between optical and electrical interconnects based on power and speed considerations,” Appl. Opt. 27, 1742–1751 ~1988!. 10. G. Yayla, P. Marchand, and S. Esener, “Energy requirements and speed analysis of digital electrical and free-space optical interconnections,” in Optical Interconnections and Parallel Processing: The Interface, P. Berthome and A. Ferreira, eds. ~Kluwer, Dordrecht, The Netherlands, 1997!, Chap. 1. 11. K. Ayadi, M. Kuijk, P. Heremans, and G. Bickel, “A monolithic optoelectronic receiver in standard 0.7-mm CMOS operating at 180 MHz and 176-fJ light input energy,” IEEE Photon. Technol. Lett. 9, 88 –90 ~1997!. 12. H. J. Veendrick, “Short-circuit dissipation of static CMOS circuitry and its impact on the design of buffer circuits,” IEEE J. Solid-State Circuits SC-19, 468 – 474 ~1984!. 13. N. C. Li, G. L. Haviland, and A. A. Tuszynski, “CMOS tapered buffer,” IEEE J. Solid-State Circuits 25, 1005–1008 ~1990!. 14. R. Geiger, P. Allen, and N. Stroder, VLSI Design Techniques for Analog and Digital Circuits ~McGraw-Hill, New York, 1990!, pp. 590 –593. 15. A. L. DeCegama, Parallel Processing Architectures and VLSI Hardware ~Prentice-Hall, Englewood Cliffs, N.J., 1989!. 16. T. C. Lee and J. Cong, “The new line in IC design,” IEEE Spectrum 34~3!, 52–58 ~1997!. 17. H. B. Bakoglu, Circuits, Interconnections and Packaging for VLSI ~Addison-Wesley, Reading, Mass., 1990!. 18. B. Wadell, Transmission Line Design Handbook ~Artech House, Boston, 1991!. 19. S. Wolf, Silicon Processing for the VLSI Era: Process Integration, ~Lattice, Sunset Beach, Calif., 1990!. 20. S. Wolf, Silicon Processing for the VLSI Era: The Submicron MOSFET ~Lattice, Sunset Beach, Calif., 1995!. 21. S. Rosenstark, Transmission Lines in Computer Engineering ~McGraw-Hill, New York, 1994!. 22. D. A. B. Miller, D. S. Chemla, T. C. Damen, and A. C. Gossard, “Electric field dependence of optical absorption near the band gap of quantum-well structures,” Phys. Rev. B 32, 1043–1060 ~1985!. 23. C. Fan, D. W. Shih, M. W. Hansen, S. C. Esener, and H. H. Wieder, “Quantum-confined Stark effect modulators at 1.06 mm on GaAs,” IEEE Photon. Technol. Lett. 5, 1383–1385 ~1993!. 24. A. V. Krishnamoorthy, A. Krishnamoorthy, T. K. Woodward, K. W. Goosen, and J. A. Walker, “Operation of a single-ended 550 Mbitsysec, 41 fJ, hybrid CMOSyMQW receivertransmitter,” Electron. Lett. 32, 764 –765 ~1996!. 25. B. Pezeshki, D. Thomas, and J. S. Harris, Jr., “Optimization of modulation ratio and insertion loss in reflective electroabsorption modulators,” Appl. Phys. Lett. 57, 1491–1492 ~1990!. 26. D. S. Chemla, D. A. B. Miller, P. W. Smith, A. C. Gossard, and W. Wiegmann, “Room temperature excitonic nonlinear absorption and refraction in GaAsyAlGaAs multiple quantum well structures,” IEEE J. Quantum Electron. QE-20, 265–275 ~1984!. 27. P. J. Stevens and G. Parry, “Limits to normal incidence electroabsorption modulation in GaAsy~GaAl! as multiple quantum well diodes,” J. Lightwave Technol. 7, 1101–1108 ~1989!. 28. T. H. Wood, J. Z. Pastalan, C. A. Burrus Jr., B. C. Johnson, B. I. 29. 30. 31. 32. 33. 34. 35. 36. Miller, J. L. deMiguel, U. Koren, and M. G. Young, “Electric field screening by photogenerated holes in multiple quantum wells: a new mechanism for absorption saturation,” Appl. Phys. Lett. 57, 1081–1083 ~1990!. L. Coldren, S. Corzine, R. Feels, A. C. Fonard, K. K. Law, J. Merz, J. Scott, R. Simes, and R. H. Yan, “High efficiency vertical cavity lasers and modulators,” in Physical Concepts of Materials for Novel Optoelectronic Device Applications II: Device Physics and Applications, M. Razeghi, ed., Proc. SPIE 1362, 79 –92 ~1990!. J. Jewell, G. Olbright, “Vertical cavity surface emitting lasers,” IEEE J. Quantum Electron. 27, 1332–1346 ~1991!. D. B. Young, J. W. Scott, F. H. Peters, M. G. Peters, M. L. Majewski, B. J. Thibeault, S. W. Corzine, and L. A. Coldren, “Enhanced performance of offset-gain high-barrier vertical cavity surface-emitting lasers,” IEEE J. Quantum Electron. 29~6!, 2013–2022 ~1993!. R. Geels and L. Coldren, “Submilliamp threshold vertical cavity laser diodes,” Appl. Phys. Lett. 57, 1605–1607 ~1990!. C. Fan, B. Mansoorian, D. A. Van Blerkom, M. W. Hansen, V. H. Ozguz, S. C. Esener, and G. C. Marsden, “A comparison of transmitter technologies for digital free-space optical interconnections,” Appl. Opt. 34, 3103–3115 ~1995!. D. A. Van Blerkom, O. Kibar, C. Fan, P. J. Marchand, and S. C. Esener, “Power optimization of digital free-space optoelectronic interconnections,” J. Lightwave Technol. ~to be published!. D. Van Blerkom, C. Fan, M. Blume, and S. C. Esener, “Optimization of smart pixel receivers,” J. Lightwave Technol. ~to be published!. A. V. Krishnamoorthy and D. A. B. Miller, “Scaling optoelectronic-VLSI circuits into the 21st century: a technology roadmap,” IEEE J. Sel. Top. Quantum Electron. 2~4!, 55–76 ~1996!. 10 January 1998 y Vol. 37, No. 2 y APPLIED OPTICS 227