Download DESIGN OF A “7490-LIKE” DECADE

DESIGN OF A “7490-LIKE” DECADE-COUNTER INTEGRATED CIRCUIT, USING GaAs MESFET DCFL FAMILY, FOR FREQUENCIES UP TO 1 GHz. Luiz Carlos Kretly and Daniel Cardoso de Souza Centro de Componentes Semicondutores - CCS / UNICAMP and Faculdade de Engenharia Eletrica e de Computacao - FEEC / UNICAMP Universidade Estadual de Campinas, Cidade Universitaria, Barao Geraldo Cx. Postal 6061, CEP: 13081-970, Campinas, Sao Paulo, SP - Brasil Fax: (019) 239-4717 E-mail: [email protected], [email protected] This work is being sponsored by PICDT/CAPES and PMU/FAPESP Abstract This work describes the full-custom design procedures for a decade-counter IC in GaAs DCFL (DirectCoupled FET Logic) logic family, starting from proof of concept until final layout completion, including the gates sizing stage and the choice of a layout strategy. This DCFL counter circuit topology is entirely based upon the classical TTL 7490 structure, which was chosen because of its versatility, and all its functionality is retained: this IC operates either as a ripple, divide-by-N (N in the 2 to 10 range), or as a symmetric divide-by-10, counter. The most important trade-offs are weighted against each other. All the circuit configurations were simulated in the HSpice software, and results show excellent counter operation with clock rates up to 1 GHz. This project was envisaged mainly for educational purposes. The designed IC will be implemented in the HGaAs-III MESFET technology by Vitesse Semiconductor (USA), a GaAs ASIC foundry, via Brazilian multi-project-circuits (PMU/FAPESP) cooperation with the French CMP. Introduction. The motivations for using GaAs gates in digital circuits implementation are the very high operating speeds achievable by these structures, typical gate delays being in the order of tens of picoseconds, as compared to hundreds of picoseconds for 0.5µm-CMOS. Although GaAs gates dissipate more static power than equivalent CMOS ones, they dissipate far less dynamic power [1], and the delay-power (D.P) figure of merit places DCFL ahead in performance, if compared with either HCMOS or ALSTTL. The most significant GaAs device is the MESFET (MEtal-Semiconductor Field-Effect Transistor). The full-custom IC design, described in this work, was approached with a bottom-up methodology [2], efficient for low-complexity designs (less than 10k gates) like this, which comprises three hierarchy levels. The Vitesse foundry suggests that designers employ DCFL logic, whose main features are: * Trades off some of GaAs high speed for low power consumption; * Operates from a single power supply; * Employs enhancement- and depletion- mode MESFETs to build all logic gates; * Requires the lowest component count per logic gate among all GaAs logic families, resulting in the simplest circuit topologies and in capability for VLSI complexity: this is perhaps its strongest advantage; * Logic gates in this family provide a much smaller delay-power product than any other logic family; * Has a poor load drive capability; * Exhibits low noise margins, due to the limited voltage signal swing; * Requires strict process control and accurate threshold voltage value realization. 1. Logic Design. TTL and CMOS logic families are based upon NAND logic, that is, all their flip-flop (FF) types are designed using inverters (NOT) and NAND gates [1, 2, 5]. This poses a problem for the adaptation of sequential circuit topologies into DCFL because, in this family, only inverters and NOR gates can be implemented easily. The use of NAND gates is discouraged because of transistor stacking effects: noise margin (NM) degradation, leading to smaller process yields [1, 6]. Figure 1 depicts a JK Master-Slave FF entirely built from NOR gates. Fig. 1 - NOR-based JK Master-Slave FF with Preset and Clear. The operation differences between this FF and the standard NAND FF (TTL 7476) are: 1) The truth-table for the NOR flip-flop is: J 0 K 0 Qn+1 Qn 0 1 0 1 0 1 1 1 Qn 2) Preset and clear are high-level-active now, and should not be both at the logic ‘1’ level at the same time. 3) The inverter gate that is placed before the inputs of the second latch, in the TTL NAND version, was moved to the inputs of the first latch, to maintain negative-edge triggering for the NOR flip-flop (otherwise it would be positive-edge triggered). Once this work’s goal is to make a DCFL counter circuit which exhibits the same well-known behavior of the TTL 7490 at its output terminals, one can think of designing it with the NOR FFs, and later seek to internally compensate for the differences in logic between the two counters, by means of adaptations, so that, from a user perspective, the operation of both is the same. So was done, and the final counter circuit is shown in figure 2. Some interconnection changes can be noticed in this full-NOR 7490 version, in comparison with the TTL 7490 IC logic circuit [4]. Such changes were made to work around the truth-table mismatches between the NAND and NOR FFs, thus equalizing the external terminal functionality of the two counters. The NAND and AND gates in this counter circuit are actually built from NOR and NOT gates using DeMorgan’s theorems. All the counter/frequency-divide-by-N configurations were simulated in HSpice, and their output signals were identical to the waveforms observed in the electronics benchtop experiments with the actual TTL 7490 IC. Thus, the DCFL 7490-like IC in figure 2 keeps the same operating modes of the TTL 7490. Fig. 2 - The DCFL “7490-like” counter/frequency-divider IC, fully designed with NOR FFs. 2. Logic Gates Sizing. The most basic DCFL gate is the inverter, shown with a load in figure 3: Fig. 3 - DCFL inverter; the diode represents the gate junction of the next-stage E-FET. The performance of a DCFL logic gate is mainly defined by the ratios between enhancement- and depletion-mode transistor sizes (ratioed design). The W/L aspect ratio of JD, WD/LD, is sized according to the current needed to charge the CL capacitance in a given time interval: I = CL dV (1) dt The W/L aspect ratio of JE, WE/LE, mainly establishes the logic low voltage value, VOL; the larger the WE/LE ratio, the smaller VOL will be, because greater will be JE’s capability to sink the current from its active load JD, and still rapidly discharge CL; finally, the greater will be JE’s pull-down strength and the logic gate’s noise margin (NM). Since JE must pass more current than JD, WE/LE needs to be larger than WD/LD by a certain factor, which is defined as the β parameter of an inverter: ( W L)E WELD β = = (2) WDLE ( W L) D Parallel connection of E-FET switches results in a NOR gate, as shown in figure 4; β is also defined by equation (2), provided the enhancement switches are equal. Fig. 4 - A two-input NOR gate in DCFL logic family. Transistors’ sizing for design optimization is an iterative process, so one must strive to reduce the number of iterations. With the purpose of gaining insight for the sizing of the logic gates and buffers which will make up the IC, many computer simulations were performed beforehand, so that the effects of the W/L and β ratio variations, over DC and transient performances of gates, could be assessed. The conclusions of all these simulations are presented next. 2.1. β parameter influence: DC simulations of an inverter, with its WD/LD ratio held constant and its β made to vary in the 7 to 13 range, showed that increasing β decreases VOL without affecting the value of VOH, thus increasing the gate’s logic voltage swing Vpp = VOH - VOL, which means increasing the logic gate NM and its reliability. However, β values greater than 12 no longer reduce VOL appreciably, and so are not worthwhile given the increase in die area. The inverter power dissipation holds approximately constant with the β sweep, provided WD/LD is constant (JD is a constant current source). AC simulations showed that, for β increasing in that same range, the tHL and tLH timing specifications of the output signal decrease; that is more noticeable for tHL, because increasing β means increasing the width of JE, which leads to a larger ∆v = VOH - VOL swing and a smaller ∆t, in equation (1). Another noticed effect was that the propagation delay (tPD) increases with β. A β value of 10 is the optimum choice, since it provides the best NM-area trade-off [3,7]. 2.2. Influence of the WD/LD and WE/LE ratios on performance: Table 1 below grades five inverter sizing choices, in the aspects of peak-to-peak output voltage swing (Vpp), tPD, approximate used die area, and static power consumption (PD,st). These inverters were simulated with the load and supply voltage shown in figure 3. PD,st was measured with the input at the high level, when it reaches its highest value at normal operation. Inverter 1 2 3 4 5 Device Dimensions WD/LD WE /LE 8/1 80/1 8/2 40/1 4/1 40/1 8/1 56/1 8/1.5 64/1 β Larger Vpp Smaller tPD Smaller area 10 1 (587 mV) 2 (19.96 ps) 5 (88 µm2) 10 3 (553 mV) 5 (31.77 ps) 2 (56 µm2) 10 4 (552 mV) 4 (28.7 ps) 1 (44 µm2) 7 5 (546 mV) 1 (19.2 ps) 3 (64 µm2) 12 2 (580 mV) 3 (26.66 ps) 4 (76 µm2) Table 1 - Inverters’ performance comparison. Smaller power PD,st 5 (2.59 mW) 1 (1.21 mW) 2 (1.27 mW) 4 (2.56 mW) 3 (1.66 mW) Some important conclusions can be inferred from Table 1: * A logic gate’s speed increases when its WD/LD ratio is increased; and for small WD/LD values, tPD is inversely proportional to this ratio (linearly) [8]. However, gate static power consumption is always directly proportional to this same ratio, stating the delayxpower trade-off: WD L D ↑, t PD ↓, PD,st ↑. * It is unwise to reduce β while keeping WD/LD constant, aiming at saving die area while retaining speed simultaneously: even though tPD remains almost the same, much is lost in terms of NM, a parameter which is directly proportional to β. However, if WE L E ↑ (with β held constant) , WD L D ↑, VOH ↑, Vpp ↑, and t PD ↓. Also, decreasing β does not appreciably reduce PD,st, only by reducing WD/LD can one attain this goal. * A comparison between inverters 2 and 3, which possess the same β and WD/LD values, yet different LD values, shows the effects of using minimum-gate-length D-FETs in inverters. In DCFL design, it is typical to use the minimum gate length value for all D-FETs (LD = 1µm) [6]. As for the E-FET gate length, LE, the minimum value allowed by the design rules must be used: 0.6 µm for the Vitesse HGaAs-III process, which must be drawn and simulated as LE = 1 µm (the same is true for the DFET’s LD). The benefits of employing the minimum allowed LE in digital circuits, are [6]: L ↓, NM ↑, g m ↑, f t ↑. Besides, the gate input capacitance is directly proportional to its W.L area. To check the effects of a decrease in the power supply voltage VDD, the inverters were resimulated at VDD = 1.5 V. The voltage source current decreased as expected, and so did PD,st by hundreds of microwatts (for instance, it fell to 708 µW in inverter n° 2). The propagation delay (tPD) decreased negligibly (~ 1 ps). Increasing the capacitive load CL on the output of a logic gate decreases the Vpp, mostly because it reduces the VOH level: CL ↑, NM ↓.This can be better visualized in the DC transfer curve. The results and conclusions obtained for inverters are also valid for NOR gates, which can be simply regarded as variable-β inverters [7]. 2.3. Buffers design: The buffer type considered for this work is the SDCFL (Source-follower DCFL) buffer. The SDCFL buffers are only efficient for driving large capacitive loads, above 100 fF, which makes them most adequate for the counter outputs, driving the pads [7]; appropriate placement of SDCFL buffers has the following benefits: a large increase in output voltage swing Vpp, hence greatly improving on a gate’s NM; decrease in the output signal’s tHL, tLH, and tPD times. SDCFL buffers must be sized with small β values to attain their best performance. The trade-offs involving WD/LD, power and speed are different than in the inverter case. When increasing WD/LD, one usually gets more signal swing and so a larger NM, but not necessarily a smaller tPD. The buffer noise margin is also still proportional to β. As for the logic gate that drives a buffer, be it an inverter or NOR gate, it must have a β value around 5, in order to optimize buffer performance. Care must be taken not to oversize the buffer, which would only cause an even greater delay instead of a gain in speed. Also, if an SDCFL buffer is used for driving a small capacitive load (CL < 100 fF), then it will present even more delay to the signal instead of speeding it up, thus worsening the transient response. So, when including these buffers in a circuit or gate output, one must be aware that their use is not always beneficial. 3. Flip-Flops and Decade Counter Design. Analysis of the counter operation as a BCD counter reveals that the worst-case timing situations are the counting transitions from state 0011 to state 0100, and from state 0111 to state 1000. For these transitions to properly occur, the added propagation delays in FF1, FF2, and FF3 have to be less than half the period of the clock signal (setup time). For the first transition, after the negative CLKA edge, a time tPD, FF1 passes until QA goes down, then one more time tPD, FF2 passes until QB goes down, and finally one more time interval tPD, FF3 for QC to rise. For counter operation at a 1 GHz clock frequency, tPD, FF1 + tPD, FF2 + tPD, FF3 ≤ 500 ps, which means that the tPD of each FF must be less than approximately 160 ps. Firstly, one FF must be designed to meet this timing constraint, while also showing an acceptable Vpp with a realistic load; then, after this compliant FF cell has been simulated and found to be adequate, it will be employed as a reference point in addressing the decade-counter problem. Used die area and power dissipation are of little concern now, all that matters is achieving the desired transient response. The FFs and the counter will be designed for a nominal 2-V supply operation, and first using the “typical” library of MESFET models. All the logic gates’ sizing iterations for this first FF were carried out in compliance with the foundry’s design guidelines [3]. A β value of 10 and a drawn LE of 1µm were adopted for all FF gates. Parasitic capacitances were included in all FF internal nodes, and their values were those suggested in the foundry design manual [3]. These parasitic capacitances have a great importance for the validity of the simulation results, because they are crucial for realistic timing estimates before back-annotation can be done from the layout. The sizing of an FF starts at the Q-output NOR gate. The current ‘I’ required to charge a given load capacitance, in a given time and logic swing, is calculated using equation (1); then ‘I’ is searched for in Vitesse’s D-FET sizing table [3]; the D-FET dimensions are found, and the gate’s E-FETs are sized for β = 10. The Q output NOR gate, for symmetry reasons, must be equal to the Q one. Previous gates’ D-FETs are sized assuming a “loading fanout” of 2, which means that each D-FET must have half the W/L ratio of its successor towards the output [3]. So an initial FF sizing was carried out; its simulation indicated a large delay from the clock input to the (unbuffered) Q output loaded with a 100 fF capacitor. The design target is now FF tPD minimization, otherwise those count states will be “swallowed”. FF resizing for speed optimization should again start at the outputs and progress toward the inputs. In the HSpice output file, it is seen that the designed FF has an output swing ∆v = 567 mV at a load CL = 100 fF, and a CL charging time tLH = 240 ps. So the load charging current, I, is around 236 µA. To come closer to the desired tLH, one needs only scale the output D-FET’s W/L ratio, which is linearly proportional to I. However, the target capacitive loading which the FF must be designed for is CL = 2 pF. So an SDCFL buffer must be included at the Q output and sized together with the FF itself, and the FF’s Q-output NOR gate must have β = 5 for better driving it, while all other gates, including Q , must keep β = 10. With the buffer connected, the output voltage in a CL = 100 fF load showed an excellent swing, with a very low VOL and a high VOH. But, when CL was increased to the final value of 2 pF, the output signal showed a corresponding large increase in tHL (the capacitor discharge time), although tLH did not change much. Since, in this kind of buffer, it is the D-FET that discharges the load, then WD/LD needs to be increased. Also, from previous buffers simulations, it has been found that β = 3 yields them an optimum speed-swing trade-off. Equation (1) shows that, to keep the same dv/dt rate, WD/LD must be increased by a factor of 20, as was the load. This approach will seldom yield the final buffer sizing at the first trial, but will take the designer close to it. So was done, and after a few extra resizing steps, a final output buffer sizing of WD/LD = 150 and β = 3 was realized. Concurrently, the output D-FET had to be ultimately scaled by a factor of 10, and all previous D-FET’s were resized to keep the loading fanout of 2, as stated in the foundry’s design manual [3]. Another way of increasing FF speed is to resize the D-FETs of every gate to minimum gate lengths while keeping the same (W/L)D ratio. The advantages of this sizing approach are considerable area savings and some increase in logic gate speed. The drawbacks are a higher power consumption and smaller NMs. Because of the area reduction, this minimum sizing approach will be employed all over this design. If initialization problems occur because of this, an effective way to fix them is to raise β above 10 for the preset/clear gates, increasing their NMs and compensating for the minimum length effect. After the design steps above, the flip-flop performance parameters had attained the following values: PD,st = 124 mW (measured with the Q output at high level), Vpp = 778 mV, and tPD = 104.4 ps; the output voltage rise and fall times, tHL and tLH, were measured as 149 ps and 138 ps, respectively. The delay-power product value reached 126.4 fJ. These values were obtained at 25 °C using a “typical” device library, which represents the process average. Worst-case temperature and process variations were then accounted for: the FF was resimulated at 0 °C using the slow process corner MESFET models, to model the worst-case circuit speed; and it was also resimulated at 85 °C using the fast corner MESFET models, to model the worst-case power dissipation. All the MESFET models, for the typical and process corner cases, were supplied in a library by Vitesse. In all simulations, it was observed that tHL and tLH reached very close values, and that tPHL < tPLH. Simulations for the slow process corner yielded the following results: PD,st = 99 mW, Vpp = 785 mV, and tPD = 142.5 ps; tHL = 150 ps and tLH = 144 ps. For the high-power (fast) process corner, PD,st was 141 mW, Vpp = 714.5 mV, and tPD = 121 ps; tHL = tLH. Delay-power product stayed in the hundreds of femtojoules range. So the FF did not fall out of its timing specification even at the slow corner; to warrant a larger timing margin for this corner, further D-FET width increases can be made. The influence of a 20% reduction in VDD supply was simulated too; the FF’s Vpp decreased to 760 mV, but nothing to harm NM; tPD was a bit smaller than for the typical case, and PD,st decreased to 103.4 mW. Each FF in the counter was individually designed, instead of having just one FF designed and reused four times, like a standard cell. For example, from figure 2 one can see that FF1 needs neither the J and K inputs, nor the Q output; FF3 does not need to have the J, K, and PR inputs and the Q output. Thus, the E-FET count per flip-flop was minimized, as well as circuit area and power dissipation. The decade-counter netlist was written from the interconnection of the four FF subcircuits and associated logic gates, as shown in figure 2. SDCFL buffers, sized as explained above, were connected to each FF’s output (QA, QB, QC, and QD), for driving the internal fan-outs and the capacitive off-chip loads. Schottky diodes, in parallel with 2 pF capacitors, were used as simulation loads at the counter outputs. SDCFL buffers were also included in the outputs of the preset and clear gates for driving their large fan-outs, and so these gates were sized with β = 5 for optimum performance. In the last simulation run, the counter showed excellent transient behavior. All FFs initialized correctly at logic “zero”, upon application of a high pulse to the clear inputs, and their Vpp was above 700 mV, except in the fast corner, where the average Vpp was 650 mV. For the typical case, PD,st = 477 mW and the delay from CLKA to QC, tPD, FF1 + tPD, FF2 + tPD, FF3 (for the transition from “0011” to “0100”), reached 368 ps. The slow corner shows the worst-case CLKA to QC delay, 404 ps, but also shows the smallest PD,st, 382 mW. The fast corner is really the worst case in terms of Vpp, NM, and PD,st, which reached up to 613 mW, and it is not the fastest case, as one might expect. For VDD decreasing by 20 %, the counter showed a small decrease in its VOH value; PD,st also decreased to 404 mW; and the delay from CLKA to QC was the smallest of all: 360 ps. So this design proves itself to be adequate for low-voltage operation. In all cases, the tHL and tLH parameters had very close values. Figure 5 shows a simulation of counter operation as a BCD counter, for the typical process. CLKA QA QB QC QD Figure 5 - Decade Counter operation as a BCD counter, at 1 GHz. Therefore, in the last simulation run before the beginning of the layout stage, typical and worst-case temperature and process variations had been successfully simulated. Now, the need for further optimizations can only be assessed after a better knowledge of interconnect delays and parasitics is available from the layout. 4. Layout. The decade-counter layout was designed in the Magic CAD software with the Vitesse technology file for the HGaAs-III process. In circuits for high switching speeds, the layout strategy (topology) is crucial for performance, because it establishes the parasitic capacitance values, which are a dominant source of delay. This circuit’s layout was drawn in the ring notation, which minimizes the coupling between the high frequency signal lines and the VDD power buses [7]. In the ring notation, the ground bus (GND) must be routed between the VDD bus and the circuit’s logic blocks, to decouple the high frequency switching noise from the VDD lines. The same strategy must be adopted for pads placement. An inverter layout example is shown in figure 6(c): (a) (b) (c) Fig. 6 - Inverter layout: (a) conventional style, (b) ring notation style, and (c) actual layout in ring notation. Minimum geometries were employed wherever possible. The interconnection widths and interlayer via cuts were sized to take into account the maximum limits of current density, in mA/µm of width, for each type of layer; these maximum values are defined by Vitesse’s electrical design rules set [2]. The HSpice simulations provided the DC and transient current values in each FET, which are necessary for this sizing. After completion of the first layout with the initial FET dimensions, the parasitic capacitance values and sheet and contact resistances were extracted and back-annotated into the HSpice netlist, and more realistic simulations were run, allowing a more reliable performance check. Then it was realized that some gates and buffers needed to be resized to improve counter performance and keep it within specifications. The 1-GHz operation was achieved, and the circuit masks are ready for submission to CMP. 5. Conclusion. This work has shown the most important trade-offs for the design of DCFL digital circuits. Logic gates and buffers were analyzed, and the insight gained was used for the design of a GaAs high-speed version for a well-known commercial commodity. The shift from NAND to NOR-based logic, demanded by the technology, was also very instructive. The circuit has been optimized for higher frequency operation, but its final sizing was only known after “layout / back-annotation / simulation / resizing” cycles. The influence of supply voltage, temperature, and process variations was accounted for in the design, when it was noticed that decreasing VDD below 2 V will decrease power consumption as well as circuit delay, so the designed IC is also well-suited for operation at lower voltages. Also, the design advice given in ref. [2] was taken, and it proved very convenient and time-saving: first simulate and debug the building blocks, before simulating all the blocks operating together; so the number of decade-counter simulations could be minimized, saving computer time. The package type chosen to house the die was the 28-pin Leaded Ceramic Chip Carrier (LDCC), because of its good power dissipation and high frequency capability (3.9 GHz). 6. References. 1 - Deyhimy, Ira: “Gallium Arsenide Joins the Giants”, IEEE Spectrum, February 1995. 2 - Huber, John P. & Rosneck, Mark W.: “Successful ASIC Design the First Time Through”, Van Nostrand Reinhold, 1991. 3 - Vitesse Semiconductor Corp.: “Foundry Design Manual”, Version 6.0, 1993. 4 - Texas Instruments Inc.: “The TTL Databook”, 1980. 5 - Texas Instruments Inc.: “Designing with TTL Integrated Circuits”, McGraw-Hill, 1971. 6 - Long, Stephen I. & Butner, Steven E.: “Gallium Arsenide Digital Integrated Circuit Design”, McGraw-Hill, 1990. 7 - Moussa, Imed: “Gallium Arsenide Digital Circuits Applications to High Rate Communication Systems and Computers”, PhD Thesis, TIMA/INPG, 1996. 8 - Van Tuyl, R., et al.: “GaAs MESFET Logic with 4-GHz Clock Rate”, IEEE Journal of Solid State Circuits, Oct. 1977. 9 - Meta-Software Inc.: “HSPICE User’s Manual Volume 3: Analysis and Methods”, Version H92, 1995.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download DESIGN OF A “7490-LIKE” DECADE