Download A Complementary GaAs (CGaAs TM) 32

A Complementary Michael GaAs (CGaAsT”) J. Kelley, Matthew A. Postiff, Timothy Department of Electrical 32-bit Multiply Accumulate Unit D. Strong, Richard B. Brown, and Trevor N. Mudge Engineering University and Computer Science of Michigan 1301 Beal Ave. Ann Arbor MI, 48109 The University Abstract seeks to exploit A high speed accumulate unit (CGaAs~~) to collect one-cycle, technology. is utilized on paths. A the partial description radiation including logic and their effect of styles, of circuit select issues, a study design is is architecture. underway. accumulate size. PUMA research project characteristics design of the PowerPC Several technology-proving We chose to design unit (MAC) the importance designs are also the high reported in a high- microprocessor speed multiply in this paper because of of multiplication in microprocessor 2. Technology as a of some of presented, methodology, on performance. 1. Introduction CGaAs is a complementary provides enhancement erostructure GaAs mode n-channel insulated-gate technology field effect transistors on the same substrate. A fully complementary lower logic style, having to other mented in CGaAs. much GaAs logic power families, The rail-to-rail which and p-channel Iike) compared (CMOSdissipation can be imple- signal swings better noise margins than those of direct-coupled logic, the most common GaAs logic family. these markets have need for micro- 2.1. and digital-signal are capable of surviving environment. In addition, pro- Motorola’s CGaAs offers the satellite high electron mobility technology inherent communications provides FET Device Structure CGaAs devices are created ited by molecular satisfies radiation provide a hostile radiation market also requires low power operation. ments. CGaAs het- (HFETs) The emerging demand for satellite communications is moving the market for radiation hardened devices out of military applications and into the mainstream. Both of cessors which and DSP systems. is used technology, is provided CGaAs in adder encoding tree CGaAs Finally, is used (D CVSL) hardness, designing GaAs Booth product logic speed, low-power 64-bit compressors Radix-4 switch to the discussion. implications multiply, Complementary and a cany result. voltage its inherent background the reduce cascode cn”tical including products the jinal to in A tree of 4:2 the partial used to determine Di#erential 32-bit is presented of Michigan’s these CGaAs these require- hardness, and its good device transconductance and enables high speed operation at low voltages. The newest generation of Motorola’s CGaAs has beam epitaxy in epitaxial (MBE). layers depos- A semi-insulating GaAs substrate is used for starting material. A cross section of a pair devices appears in Figure the carriers to a conducting of nHFET 1. Heterojunctions and pHFET help confine channel formed in the InGaAs drawn gate lengths of 0.5pm and thresholds of 0.35V. The process uses refractory metal gates and self aligned layer, where they have high nobilities because there is no impurity scattering. The large bandgap AlGaAs layer provides gate isolation, and the top GaAs layer is used as a source/drain cap to prevent surface oxidation implants. Ohmic contacts to the source/drain areas are formed using a refractory high temperature stability metal, which provides and the opportunity to use standard aluminum metallization for interconnect. Three levels of AICU metallization are currently available, and a fourth level of metal is currently under development [ 1, 2]. In most MESFET a Schottky of the AlGaAs. technologies, diode junction. the gate is formed In CGaAs, by the large bandgap of the AlGaAs layer between the TNN gate metal and the channel increases the n-gate diode turn-on voltage (where Ig=l@~m2) to 1.75V, and the p-gate diode turn on voltage to -2V, with Vds = OV [1, 3]. Gate leakage 0-8186-8316-3/98 $10.00 (c) 1998 IEEE Tw4 ate at 1.3V, which is slightly gate / \ lower than the technology’s power supply rating of 1.5V. This moderates performance somewhat but provides a significant 25d . . . . . . . .. —- .-- . . . . . ..— — .. .. . . . The gate layer has a lower resistance than its CMOS 8MlGa&s polysilicon d. I-GUS ?4 DMa m$m.g %Cm4 ,4*3 ,.AF3A9 Lulrlw semiconductor surface, and there is a leakage current even in oxygen implanted field areas. Therefore, one must Radiation immune 1. Cross section CGaAs devices increases with higher gate voltages, larger gate areas, and drain-source lowering). voltage (due to drain-induced barrier This leakage must be addressed in the design phase. The threshold duction voltage of the devices is set by the con- and valence band discontinuities InGaAs, by a silicon delta doping versions of CGaAs thresholds (without implant the channel at 0.55V. The new version for a 0.35V threshold of the AIGaAs/ [1]. Earlier implant) set the is being optimized which will provide improved speed voltage drop across an n-transistor passgate when passing a logic one. Ohmic contact source/drain allows between subsequent provides the first metal areas is provided high temperature a conventional three-level with stacked vias. A fourth layer by a refractory In a CMOS charge due to radiation, process, which causes rate was improved by the addition of a Low Temperature GaAs (LTG) layer to reduce charge collection from ionizing radiation (see Figure 1). The LTG layer also improves between characteristics devices. and provides Optimization better isolation of the new structure is ongoing. With the immunity of CGaAs to latchup, a semi-insu- GaAs substrate, and the added LTG layer, CGaAs devices are suitable [2]. 3. Multiplier for radiation intensive environments Architecture and the metal which processing. AICU accumulates It is effects because it does not leakage between transistors and a shift in the threshold voltages. Recently, the single event upset (SEU) soft error lating and the possibility of using unipolar transmission gates (npass gates). The earlier high-threshold devices resulted in an unacceptable hardness of the process is excellent. to total dose radiation subthreshold layer placed just below the channel area, and by a p-channel but it is in direct contact with the use Si02 as a gate or field insulator. Si02 higher counterpart, route as little in the Schottky gate metal layer as possible. ---- Ned to scale Figure in gate leak- f 534 ,4nG& LTG G* .. ... .. . ... ... ... .. reduction age. WA I-Gans CGaAs Although independent metallization CGaAs ment the MAC, accumulate chosen to imple- architecture of the technology. for the multiply metal level is in development is the technology the high-level The overall chip (MAC) is essentially design goals included high but was not used in this design. The metal feature size and spacing are coarser than those of a typical CMOS technol- speed, low area, and low power. The main focus was to explore the design space and to determine techniques and ogy of comparable circuit topologies for digital CGaAs designs. With these objectives, the MAC was designed as a one- Isolation implant of feature size [4]. the devices that extends This technique implant minimum into by an oxygen the semi-insulating is provided substrate. separates the n and p active regions. is used to avoid the mesa etch step common GaAs technologies, ity. Unlike An in and it results in better surface planar- CMOS, the substrate is semi-insulating, so wells and well contacts are not necessary, and latchup is unit. The is composed of two subunits, a 32-bit multiplier two’s and a 64-bit complement adder (see Figure puts and internal registers multiply-accumulate 2). Latches on the inputs, outare included designed in CGaAs have several important tures. The low threshold voltages, full power rail featransi- tions and high transconductance at low supply voltages make low voltage performance good. These lower supply reduce both dynamic in a global scan chain for testing purposes. The 32-bit, two’s complement multiplier tree is imple- A 4:2 compressor operates on four partial product bits and compresses these to two result bits. A tree layout based of 4:2 compressors has more regular cell placement and simpler routing than a 2.2. Process Characteristics voltages MAC mented using 4:2 compressors. not possible. Circuits cycle, power dissipation and 3:2 (full-adder) implementation [5]. A full tree of 4:2 compressors takes five levels to pro- duce the two 64-bit sum and carry results presented to the final adder. The utilization of Radix-4 Booth encoding eliminates one level of compressors and reduces area [6]. power lost through the gate. The MAC is designed to oper- 0-8186-8316-3/98 $10.00 (c) 1998 IEEE 4. Circuit Ad .4mB CGaAs which full R--P 7- provides complementary, dynamic ! Carrwpas! I Figure I A,&rc I families in unit, including passgate, XOR source-coupled, Voltage Switch dual-rail domino “keeper” pHFET devices to maintain nodes. HSPICE simulations the design pseudo-static, charge sharing influencing 3. gate with voltages show device not only allows the clock making Logic gate appears in Figure dynamic “keeper” Fmlklder of logic the MAC Cascode An example The gate is a standard DCVSL I LnE3y@ r- variety dynamic, Differential (DCVSL). LwhTree l-, a wide we could have designed and unipolar circuits. Each of these represents different points in the performance, size, power and noise immunity space [7]. The MAC was designed using primarily I [ Implementation on the that this to be stopped, but also helps to prevent in the evaluation logic from incorrectly the output. 2. MAC Architecture In addition to performing a 32-bit multiply, the 64-bit intermediate result can be used as an accumulate value; this technique is often used in signal processing The accumulate is implemented by folding late (CRegister) value into appropriate partial tree of the multiplier. product because some inputs the multiplier. tial products circuits. the accumu- bit positions This to the compressors in the is possible are not used by The value is thereby combined with the parfrom the A and B registers, and is already added to the product mediate registers. when the result appears at the inter- The two 64-bit quantities output from the multiplier are added to form the final result, which is also the next accumulate value of the MAC. posed of seven 16-bit the outputs The 64-bit adder is com- adders so that multiplexors of adders that have the correct Figure 3. Dynamic “keeper” transistors DCVSL XOR gate with select carry results from lower segments. Each 16-bit adder is designed with The DCVSL than a fully style circuits complementary allow higher design performance because devices are 3 to 4 times cient adder that requires only a few standard cells. pHFET devices. The adder and multiplier make up most of the MAC. To facilitate testing, the MAC incorporates several test power. The XOR gate of Figure 3 has an evaluation 4-bit carry-lookahead. devices. Two registers are included When This architecture (CarryRegister between the multiplier these registers are selected, two-cycle pipelined registers is included ters are included provides an effi- and SumRegister) and the final the MAC adder. becomes a design. A full scan-chain of all the for testing. Since the pipeline regis- in the scan chain, the intermediate results The gates are efficient for speed and delay of 428ps at 189p.W for a power delay product of 8 lfJ. While the presence of dynamic nodes might increase the SEU sensitivity of the design, the result is still expected to be much less radiation Evaluation of the MAC sensitive than a CMOS for radiation Booth of the multiplier tree are available for debug, and inputs can be scanned directly into the final adder to aid in its testing. The inputs to the multiplier tree and the accumulate value are latched at the beginning and end of the unit, dynamic static circuits encoding is hidden circuits. circuit. hardness will our understanding of dynamic CGaAs circuits. The MAC uses full complementary circuits Radix-4 respective y (see Figure 2). nHFET faster than the corresponding add to to perform [9, 11, 12]. The delay of these within the pre-charge Once all dynamic time of the nodes in the circuit have been pre-charged, they evaluate in domino fashion. CGaAs also improves performance and reduces the size of the logic for Booth encoding over DCFL designs. Booth encoding provides small 0-8186-8316-3/98 $10.00 (c) 1998 IEEE is far more efficient fast multiplexors. in a technology DCFL designs that must implement the coding logic in NOR-NOR configurations which are larger [8]. 4.1. General Design Considerations Among the more obvious differences is that CGaAs has no wells. and CMOS between CGaAs This allows p- and n-channel transistor drains to be abutted, prevents Iatchup, and eliminates the need for well contacts. Another While difference the Schottky routing, in CGaAs is the Schottky layer. layer in CGaAs can be used for local it leaks to the substrate, so its use as an interconnect layer should dynamic be limited circuits. Dynamic charge on their internal transistors helps especially circuits when used with rely on the storage of nodes. The use of small “keeper” maintain the charge on the dynamic nodes. Figure 4. Preliminary In addition, the n-transistors in CGaAs have a transconductance about 4 times better than the p-transistors. Due to the better transconductance, circuits should be designed for primarily in the n-transistors. very well to dynamic circuits operated in pipelined the n-network. should Complementary be designed avoiding This design style lends itself where most of the logic is in logic designed primarily of NAND the through 73 MHz core critical path mode. thus 5. Conclusion large p-stacks. Although many using process of Verilog. the MAC proper functionality described of the characteristics and circuit techniques of CGaAs may be different than in CMOS, the methodologies behind designing chips are very similar. The MAC is models simulated in CGaAs structures, Physical Design modeled The the design is 13.7ns, yielding a clock speed of in single cycle mode or about 140MHz when The MAC 4.2. is 239mW. MAC Layout Both were behavioral simulated by applying and structural and verified in this paper demonstrates of CGaAs technology several in a high speed design. Careful attention to layout and circuit details is necessary to avoid difficulties that can be caused by the differences between CGaAs and CMOS. However, CGaAs technology power, radiation provides a high hard alternative performance, for digital low designs. for targeted and random test Acknowledgment vectors. In conjunction with the development els, a standard cell library were custom designed of the MAC mod- was created. The standard cells using Mentor Graphics ICstation. Each standard cell was simulated and verified using HSPICE. With a standard cell library in place, the MAC core was designed EPOCH using compiler. cell placement Cascade EPOCH and routing. and routed to the padframe. layout of the MAC ing and power dissipation cess once the design obtained. The final 60,904 [1] Automation’s The core was manually placed Figure 4 shows a preliminary analysis on clock buffer- needs to be completed. using Motorola’s is completed [2] The CGaAs pro- and a final layout is transistors. (without clock HSPICE-estimated buffering) power contains was supported by DARPA contract DAA [3] J. Abrokwah, J. Huang, W. Ooms, C. Shurboff, J. Hallmark, R. Lucero, J. Gilbert, B. Bemhardt, G. Hansell, “A Manufacturable Complementary GaAs Process: 1993 IEEE GaAs IC Symposium Technical Digest, pp. 127-130. M. LaMacchia, J. Abrokwah, B. Bemhardt, D. Foster, B. Crawforth, B. Mathes, T. McGuire, T. Weatherford, “Radiation Hardened Complementary GaAsV 1997 IEEE GaAs IC Symposium Technical Digest, pp. 59-61. A. Akinwande, P. Ruden, D. Gnder, J, Nohava, T. Nohava, P. design work References was used for both standard unit. Further MAC unit will be fabricated Design This H04-94-G-0327. Joslyn, Heterostructure dissipation 0-8186-8316-3/98 $10.00 (c) 1998 IEEE J. Breezley, PETs for Low “Complementary Power Integrated HI-v Circuits:’ 1990 IEEE Technical [4] P. O’Neil, G. International B. Bemhardt, Hansen, “GaAs Motorola; Digest, [5] 1993 Meeting F. Nikpourian, IEEE Heterojunction C. Delta, Circuit GaAs IC Y. Abad, Fabrication Symposium at Technical [18] P. Song, Issues for a High Processor, 1995 Ph.D. Solid-State Clock Dissertation, of Michigan. Computer Arithmetic Algorithms, 1993 Prentice Hall. [7] R. Brown, Par&h, B. Bemhardt, T. Basso, Foster, B. Crawforth, T. McQuire, of Complementary “Overview for High-Speed VLSI Circuits,” J. Abrokwah, P. C. Gauthier, D. K. %rkallah, R. Lomax, GaAs Technology 1997 IEEE Transactions on Circuits. M. Riepe, T. Huff, M. Upton, T. Mudge, 53x53-bit Array Parrdlel GaAs DCFL;’ Engineering [9] S. Stetson, T. Mudge, VLSI [8] M. LaMacchia, S. Gold, G. Multiplier University and Computer Fast Bewick, Implementation, R. Brown, in 0.6 pm Michigan, Electrical of Science, pp. 1-9. Multiplication: 1994 “A 7-nS Implemented Algorithms Ph.D. Dissertation, M. Sherwin, P. Robertson, and Stanford University. [10] A. Baca, J. Zolper, Howard, D. Rieger, Junction-Gated [11 ] N. Klein, 1994 IEEE R. Shul, A. “Complementary Heterostructure Technology;’ Digest, J. Field GaAs IC GaAs Effect Transistor Symposium Technical pp. 59-62. Burgess, Booth’s “Removal Algorithm Letters, August [12] M. Roorda, of Sign-Extension Circuitry Multiplier-Accumulators~ From Electronics 1990, Vol. 26, No. 17, pp. 1413-1415. “Method Multiplier That Electronics Letters, to Reduce the Sign Bit Extension Uses the Modified September Booth 1986. Vol. in a Algorithm;’ 22, No. 20, pp. 1061-1062. [13] M. Mehta, Multiplier V, Compressor Computer [14] J. E. C. Shurboff, J. Huang, B. “0.9 “High-Speed M. on Flym, University R. J. 15 ns 4K Mukiply/Accumulator~ 1994 Adder; Blocks: Lucero, A Technical Binary Research and Development, Stanford and Symposium Ooms, V DSP and a 45 ns 16-bit [16] N. Quach, IEEE Counter pp. 43-50. IEEE GaAs IC Symposium [15] H. Ling, “High-Speed Multi-Input 1991 Circuits,” Hallmark, Swartzlander, Using Arithmetic, Abrokwah, SRAM Parmar, Design Digest, pp 55-58. 1981 IBM Journal of Vol. 25, No. 3, pp. 156-166. “High-Speed Technical Addition Report, in CMOS;’ CSL-TR-90-415, 1990. [17] J. Huang, Majerus, Mills, E. Glass, J. Abrokwah, B. E. Spears, J. Parsey, D. Scheitlin, K. Optimization Hawthorne, J. Blaugh, for a Low Voltage “Device Enhancement Bemhardt, R. Droopad, and Technical G. DeMicheli, Offs for High-Speed Circuits, [19] J. Rabaey, Digital and Circuit Floating-Point 1. Koren, FET for Portable GaAs IC Symposium Integrated Architectural University [6] Devices pp. 123-126. T. Huff, Rate Electron Digest, pp. 983-986. M. L. Process Mode Power 0-8186-8316-3/98 $10.00 (c) 1998 IEEE Applications: 1997 IEEE Digest, pp. 55-58. “Circuit and Architecture Multiplication: Trade- 1991 IEEE Journal Vol. 26, No. 9, pp. 1184-1198. Integrated Circuits, 1996 Prentice Hall. of

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download A Complementary GaAs (CGaAs TM) 32