* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Design of a Restartable Clock Generator for Use in GALS SoCs
Stray voltage wikipedia , lookup
Immunity-aware programming wikipedia , lookup
Control system wikipedia , lookup
Current source wikipedia , lookup
Pulse-width modulation wikipedia , lookup
Electronic engineering wikipedia , lookup
Mains electricity wikipedia , lookup
Switched-mode power supply wikipedia , lookup
Oscilloscope history wikipedia , lookup
Integrated circuit wikipedia , lookup
Alternating current wikipedia , lookup
Two-port network wikipedia , lookup
Buck converter wikipedia , lookup
Analog-to-digital converter wikipedia , lookup
Regenerative circuit wikipedia , lookup
Schmitt trigger wikipedia , lookup
Resistive opto-isolator wikipedia , lookup
Rectiverter wikipedia , lookup
Network analysis (electrical circuits) wikipedia , lookup
Flip-flop (electronics) wikipedia , lookup
Design of a Restartable Clock Generator for Use in GALS SoCs by Hu Wang, Bachelor of Science A Thesis Submitted in Partial Fulfillment of the Requirements for the Master of Science Degree Department of Electrical and Computer Engineering in the Graduate School Southern Illinois University Edwardsville Edwardsville, Illinois August 6, 2008 ABSTRACT Design of a Restartable Clock Generator for Use in GALS SoCs by Hu Wang Advisor: Dr. George L. Engel This thesis presents the design of an instantaneously restartable crystal-controlled clock generator in a 90 nm CMOS (Complementary Metal-Oxide Semiconductor) twin-well process. The clock generator is a critical component in a newly proposed blended design methodology in which clocked subsystems are connected by using clockless sequencing networks. The novel design overcomes the problem of poor control of the clock period generally associated with pausible clocks. The generator described herein is fully compatible with any deep-submicron technology and insensitive to process variations. The design uses only ‘regular’ threshold voltage FETs (Field Effect Transistors). Simulations indicate that the design is capable of functioning correctly at frequencies up to 1 GHz irrespective of process corner while operating from a single 1 Volt supply and consuming 10 mW of power. The duty cycle of the clock is near 50%, and the delay in restarting the clock is small, less than 1.5 periods. ii ACKNOWLEDGEMENTS I would like to thank Dr. George Engel for his support and guidance as well as the opportunity to participate in this research project without which I could not have progressed so far. I would also like to thank Jerry Cox, Senior Professor of Computer Science, Washington University, Saint Louis, for his advice during the various stages of this project. I would like to thank my fellow researchers, Dinesh Dasari, Nagendra Sai Valluru for their help. I would like to acknowledge the contributions of Sasi K. Tallapragada. Mr. Tallapragada successfully designed, fabricated (in a TSMC 0.25 μm process), and tested an earlier version of the designed presented here. His work laid the groundwork for the design described in this thesis. I also wish to thank the National Science Foundation and Blendics, LLC. This work has been supported by a NSF-STTR grant from the Blendics company. I would like to thank Dr. Scott Umbaugh, Dr. Scott Smith, and Dr. Brad Noble, ECE faculty members at SIUE, who actively encouraged me. With their support, I was able to explore new areas. I would also like to thank Dr. Bob Leander and the rest of the ECE faculty who also have supported me throughout my education at SIUE. Finally, I wish to thank my family for being patient with me and encouraging me throughout the entire of my college experience. In particular, I would like to thank my wife Yue Cheng, without whom I may not have finished. iii TABLE OF CONTENS ABSTRACT.............................................................................................................................. ii ACKNOWLEDGEMENTS..................................................................................................... iii LIST OF FIGURES ................................................................................................................. vi LIST OF TABLES.................................................................................................................. vii Chapter 1. INTRODUCTION .....................................................................................................1 Background ................................................................................................................1 Blended Design Methodology ...................................................................................2 Objective and Scope of Thesis...................................................................................6 2. CLOCK GENERATOR DESIGN .............................................................................7 Overview of the Clock Generator ..............................................................................7 Principle of Operation................................................................................................8 Initial Analog Multiplier Design................................................................................9 Improved Analog Multiplier Design........................................................................15 High-Speed Comparator Design ..............................................................................18 SR Latch and OR Gate Design ................................................................................19 Track and Hold Circuit Design ................................................................................20 3. NON-IDEAL EFFECTS..........................................................................................22 Channel Length Modulation ....................................................................................22 Sensitivity Analysis .................................................................................................23 Mismatch and Offset Analysis.................................................................................26 Device Sizing ...........................................................................................................28 4. SIMULATION RESULTS ......................................................................................32 Analog Multiplier Simulation ..................................................................................32 Frequency Response of Symmetric Miller OTA .....................................................32 Hold Signal Verification ..........................................................................................34 Track and Hold Circuit Simulation..........................................................................35 Restartable Clock Generator Simulation .................................................................36 v 5. CONCLUSIONS AND FUTURE WORK ..............................................................39 Conclusions..............................................................................................................39 Future Work .............................................................................................................40 REFERENCES ............................................................................................................41 v LIST OF FIGURES Figure Page 1.1 Data processing subsystem with a control wrapper and clock generator ................4 1.2 Pair of processing elements communicating using clockless sequencing network .5 2.1 Architecture of restartable clock generator..............................................................7 2.2 Initial analog multiplier..........................................................................................10 2.3 Original combiner circuit [Hsi:98].........................................................................14 2.4 Improved combiner circuit.....................................................................................15 2.5 Automatic gain control circuit for resistive PFET.................................................16 2.6 Symmetric Miller OTA..........................................................................................17 2.7 High-speed analog comparator ..............................................................................18 2.8 SR latch circuit.......................................................................................................19 2.9 Track and hold circuit ............................................................................................20 4.1 Analog multiplier simulation .................................................................................33 4.2 Frequency response of symmetric Miller OTA .....................................................33 4.3 Hold signal verification..........................................................................................34 4.4 Transient response of T/H circuit ..........................................................................35 4.5 Frequency response of T/H circuit.........................................................................36 4.6 Clock generator simulation ....................................................................................37 4.7 Delay associated with restarting the clock.............................................................38 vi LIST OF TABLES Table Page 3.1 Comparison of simulation results ..........................................................................23 3.2 Simulation results of parameter values across process corners .............................25 3.3 Estimated and simulated results in initial and improved multipliers.....................25 3.4 Combiner FET sizes...............................................................................................28 3.5 Bias circuit and OTA FET sizes ............................................................................30 3.6 NAND gate and T/H circuit FET sizes ..................................................................30 3.7 Comparator FET sizes............................................................................................31 vii Chapter 1 INTRODUCTION Background Currently, verification occupies 60% to 80% of the engineering hours expended on the design of complex integrated circuits (ICs). Unless there is a significant paradigm shift, the engineering effort directed toward verification is sure to continue to grow as chip density grows. Module reuse along with elimination of the global verification component of chip design has the potential to cut the design time of future chips by up to an order of magnitude. This can be accomplished by reusing previously verified modules and by connecting these modules with provably-correct, delay-insensitive pathways. As a result, all global verification will be process independent, and only local verification will be process specific. These technology advantages can curtail the explosive time-to-market growth of future ICs as Moore's law drives chip transistor density ever upward. We are developing such a methodology, one that blends clockless and clocked systems and eliminates the need for global verification. The methodology utilizes a clockless sequencing and communication network interconnecting locally verifiable clocked modules so that all global signaling exhibits correct behavior, independent of the specific IC process used for implementation. Delay-insensitive control and data paths are composed from a set of six control elements implemented as standard cells and designed to be completely free of both combinational and metastability hazards. This guarantee is valid, provided the composition of elements follows a few simple rules. Fan-in and fan-out are possible for both control and 2 data paths, and there is no requirement for global control mechanisms. Furthermore, sending data in packets of arbitrary length (including a single byte or word) is supported. If the local domain is clocked, the use of a synchronizer between the local and global domains is avoided by means of a novel clock generator [Cox:88], [Cox:07]; the design of which is the focus of this thesis. The clock generator can function with a remote crystal source or with a local, MEMS–based, all-silicon source and can pause and then resume asynchronously, in either case. Hazards due to poor synchronizer design are avoided. Adding the clock generator to a clocked subsystem makes it appear to the rest of the system as if it were clockless, i.e., operating completely asynchronously. These clocked subsystems may then be composed with the rest of the system’s components, utilizing asynchronous control and data paths. As a consequence, even for clocked subsystems, the need for verification of global timing is avoided by use of global paths that function with complete insensitivity to variations in delay. This methodology for blending of clockless and clocked systems is a special case of the Globally Asynchronous, Locally Synchronous (GALS) [Cha:73] design approach. The special features of our approach are the guaranteed correctness of the compositions of the control elements and freedom from all combinational and metastability hazards. Blended Design Methodology GALS systems suffer from either poor control of the local clock period or the potential for synchronizer failure. Synchronizer failures occur when setup and hold time constraints are violated in a system consisting of flip-flops or registers. Chapiro [Cha:73] proposed a taxonomy for GALS circuits based on the method used for synchronizing data with the local clocks. The methodology presented within this thesis falls into his 3 “escapement” category [Tra:91]. It avoids the problem of synchronizer failures by stopping the clock and then restarting it when data is valid. In the majority of similar schemes presented in the literature, the pausible clock is implemented using a delay-based circuit or ring oscillator. Pausible clocks are often discussed in the relevant literature; for example, in [Tra:91], [Dob:99], [Ami:07], [Yun:99], [Gür:06], and [Bei:08]. In the scheme proposed in this research, the clock generator is based on a stable crystal oscillator, yet functions like a delay-based clock. We call it a “restartable” clock to distinguish it from existing pausible clocks. Unlike a conventional crystal oscillator, it can be started at any instant in time, taking on a phase reference determined by that starting instant. Two (or more) processing elements with such clocks can carry out the transfer of data without the requirement for synchronization and can thus be free of any synchronizer failure. This freedom from synchronizer failure is accomplished by an asynchronous layer between the two subsystems such that the two systems are no longer completely independent. The purpose of this asynchronous layer is to initiate the operation of the data processing subsystem’s local clock, and to signal an acknowledgment of the completion of that action. Because none of the subsystem clocks are fixed in their phase, the blended design methodology will never produce a system that is capable of a synchronizer failure. This is true even though their periods are controlled by a crystal oscillator. When arbitration is required in the asynchronous layer, an arbiter control element waits until circuit resolution is complete before proceeding and thereby avoids metastability errors. Fig. 1 illustrates a data processing subsystem embedded in a control wrapper along with the clock generator. The clock generator serves as a local clock to the data processing subsystem. The heavy solid lines indicate data paths between different data processing subsystems and lighter lines indicate sequencing paths between control elements indicated by circles. The purpose of 4 the control wrapper is to initiate the data processing subsystem’s local clock and to signal an acknowledgement indicating that the action is completed. Control ControlWrapper Wrapper Data Data Processing Processing Subsystem Subsystem Clock Clock Generator Generator Fig. 1.1 Data processing subsystem with a control wrapper and clock generator The control wrapper in Fig. 1.1 is designed by using control elements similar to those that were defined by Clark et. al. in 1967 [Cla:67]. The control elements are the Fork, Merge, Join, Dual-Join, and Arbiter. A sixth control element, the Transfer, has also been recently defined (although it was implicit in the earlier work). This element transfers control signals between the global clockless domain and a local domain with timing constraints. All six control modules have been simulated and behave as predicted. Many complex systems were built utilizing an equivalent set of control elements [Cla:74], an indication of their great expressive power. These elements were all externally delay-insensitive (DI) allowing the composition of endlessly scalable systems. 5 An example application of our blended methodology is illustrated in Fig. 1.2. Two processing elements (A and B) communicate using a clockless sequencing network. The Any size clocked domain PB Clk ClkBB AABB B1i B1i B1c B1c FIFO A1i A1i A1c <=AA A1c BB<= B2c B2c B2i B2i A2c A2c A2i A2i FIFO FIFO =>AA BB=> AAAA Clk ClkA PPAA Clockless sequencing network Fig. 1.2 Pair of processing elements communicating using clockless sequencing network aforementioned initiation and completion signals are explicitly depicted in the figure. The block indicated as ‘A’ is an Arbiter and the block indicated as ‘Clk’ is the “restartable” clock. These control elements can provide a high-level abstraction for designing complex asynchronous SoCs (Systems on Chips), but they must be brought up to date with today’s technology by avoiding any dependence on IC process characteristics and by becoming internally, as well as externally delay insensitive. Toward this end, Petri net models of the six elements and their environments have been defined [Cox:05] and their reachability graphs calculated. The ternary test [Ung:69] for instability and metastability hazards can be applied 6 to verify the absence of the former and provide the necessary information for control of the latter. Systems composed from the resulting element set are delay-insensitive and hazardfree, independent of the IC process in which the elements are implemented. Objective and Scope of Thesis The objective of this thesis work was to design an instantaneously restartable crystal controlled clock generator in a IBM 90nm CMOS twin-well process. As a critical component in a newly proposed blended design methodology, it can be used in GALS SoCs. There are five chapters in this thesis. In Chapter 2, the design of the restartable clock generator is described in detail. Non-ideal effects including channel length modulation, offset due to mismatches and an analysis of the sensitivity of the design to variations in IC process parameters are discussed in Chapter 3. Transistor sizes for the circuits used in the various sub-systems described in Chapter 2 are also given in Chapter 3. Chapter 4 presents the simulation results for the clock generator. Finally, conclusions and future work are presented in Chapter 5. 7 CHAPTER 2 CLOCK GENERATOR DESIGN Overview of the Clock Generator The clock generator is based on a stable crystal oscillator, yet functions like a delaybased clock generator. Unlike a crystal oscillator, the proposed clock generator can be started or stopped at any instant of time by taking on a phase reference determined by the starting instant. The architecture of the clock generator is illustrated in Fig. 2.1. Four Quadrant cos u Multiplier Oscillator Input Quadrature Signal Generator sin t cos t Quad Track & Hold sin t u) Compare Clock Four sin u Quadrant Multiplier hold A1i A1c A2c A2i 22 4 22 4 22 4 22 4 Init1! SRFF1 SRFF2 Done! Init2! Fig. 2.1 Architecture of restartable clock generator 8 The ‘!’ suffix on signal names indicates four-phase signaling which means that the signal must return to its previous level (i.e. a pulse). Information is transmitted via transitions for two-phase signaling. The act of “starting” the clock is accomplished when either one of the ‘Init!’ signals is asserted. After the data processing completes, the subsystem activates the ‘Done!’ signal, which stops the clock. Data communication between the subsystems can then commence. Principle of Operation The clock generator, depicted in Fig. 2.1, is constructed from the following: a pair of fully-differential analog multipliers, a current-mode comparator, a quad track-and-hold (T/H) circuit, a pair of SR latches, and an OR gate. The quadrature signals are generated from a crystal or MEMS-based oscillator [Mob]. The design of the circuit used to generate these quadrature signals is outside the scope of this thesis and will not be discussed further. The operation of the clock generator is based on the simple trigonometric identity [Cox:88] and [Cox:07] given by sin ω (t − u ) = sin ω t ⋅ cos ω u − cos ω t ⋅ sin ω u (2.1) where, ‘ω’ is the angular frequency in rad/sec, ‘t’ is time in seconds and ‘u’ is the delay in seconds introduced with respect to the start of the clock generator i.e. the assertion of one of the ‘Init!’ signals. The clock is generated when either one of two ‘Init!’ signals is asserted. Then, the output of one of the SR latches which initially are both reset is set. The high level on the hold input of the T/H circuits forces them into the “hold” phase. If the ‘Done!’ signal is asserted, it resets both SR latches. This places the T/H circuits into the “track” mode. 9 The outputs of the track and hold circuits shown in Fig. 2.1 correspond to the state of the clock generator during the hold phase i.e. when the clock is running. In reality all eight inputs to the analog multipliers pass through T/H circuits. The additional T/H circuits (on the +/- sinωt and +/- cosωt inputs) are always held in the “track” mode and were inserted to equalize the delays so that all signals arrive at their respective inputs to the analog multipliers at the same time. The T/H circuit was implemented using a simple CMOS transmission gate with the input capacitance of the multiplier serving as the sampling capacitor. The analog comparator compares the differential input signal, producing a clock signal which is high when this signal is positive and low when it is negative. When T/H circuits in Fig. 2.1 are placed into track mode, the clock is to be stopped. During this circumstance, the analog inputs to the upper multiplier and the lower multiplier in Fig. 2.1 are identical, thereby, presenting a zero differential signal to the comparator. A small systematic offset must exist or must be inserted into the comparator to ensure that the comparator output is driven low (and remains low) when the differential signal is zero. In the clock generator design, the analog multiplier is the most prominent component. The analog multiplier is used to implement the product of two sinusoidal inputs. As shown in the equation (2.1), two multipliers are required. The following section will explore the design of the analog multiplier in more detail. Initial Analog Multiplier Design In this section we discuss our initial [Tal:05] four quadrant analog multiplier design which is based on a circuit first presented by Hsiao et. al. [Hsi:98]. In following section, we demonstrate how the design may be modified to reduce the sensitivity of the circuit to 10 process corner. Moreover, the modifications result in a circuit free of resistors which may be especially important when implementing the circuit in technologies which do not easily support resistors. The analog multiplier in Fig. 2.2 uses six symmetrical structures called combiners because they combine the input signals to form the output [Hsi:98], [Deb:02]. VB is the DC VDD VDD VDD R R i ia va VDD R R ic b iop i d vb 1 2 M10 3 4 vc M11 5 vd M1 iom M9 M2 M3 M4 M5 M6 M7 M8 6 VB + v1 VB - v1 VB – v2 VB + v2 Fig. 2.2 Initial analog multiplier pedestal on which the input signals rest while v1 and v2 are the two signals to be multiplied. Multiplication is achieved through the use of the quarter-square principle shown below x⋅ y = 1 [( x + y ) 2 − ( x − y ) 2 ] 4 (2.2) 11 The differential nature of the circuit is the central feature which allows high performance, even when using this very simple structure to implement the multiplier. A single combiner, shown in Fig. 2.3, consists of two NMOS transistors operating in the saturation region and a resistor, R. +VDD R Vout Vin1 M1 M2 Vin2 Fig. 2.3 Original combiner circuit from [Hsi:98] The combiners in the multiplier are divided into two stages. We refer to the combiners labeled as 1, 2, 3, and 4 in Fig. 2.2 as “first-stage” combiners. The “second-stage” combiners (labeled 5 and 6) are a pair of NMOS transistors working in the saturation region with the resistor R omitted. The purpose of the resistor in the second-stage combiner of [His:98] was to transform the output current into an output voltage. As indicated in the equation (2.1), the differential outputs from the analog multipliers must be subtracted in order to produce the delayed sinusoid. The taking of this difference is more easily performed in the current domain, and so we allow our multiplier to have a current rather than a voltage output. 12 The transistors in the analog multiplier from [Hsi:98] are operated in saturation, and the current in each combiner can be represented by the square law characteristic of a MOS transistor given by iDS = 1 ⋅ K pn ⋅ S n ⋅ (vGS − VTN ) 2 2n (2.3) where iDS is the drain-to-source current of the NMOS transistor, vGS is the gate-to-source voltage, VTN is threshold voltage, Kpn is transconductance parameter, n is the subthreshold slope factor, and Sn is the shape factor ( Wn / Ln where Wn is the width and Ln is the length of the NFET). We assume, for simplicity, that all of the NFETs in the multiplier are matched and we ignore second-order effects such as channel length modulation. The output node voltage va of the first combiner shown in Fig. 2.3 is given by va = VDD − ia ⋅ R (2.4) where R is resistance, VDD is the supply voltage, and ia is the sum of the drain-to-source currents for transistors M1 and M2 of the first combiner. The current flowing in each transistor is given by the expression in equation (2.3). Substituting the input voltages VB + v1 and VB + v2 which are applied as gate voltages to the transistors M1 and M2 respectively and using equation (2.3), then the expression of va in equation (2.4) can be written as va = −R −R ⋅ K pn ⋅ S n (VB + v1 − VTN ) 2 + ⋅ K pn ⋅ S n (VB + v2 − VTN ) 2 + VDD 2n 2n (2.5) 13 Expanding the above expression and collecting like terms, one finds va = VDD − − Sn ⋅ K pn ⋅ R 2n ⋅ v12 − Sn ⋅ K pn ⋅ R Sn ⋅ K pn ⋅ R ⋅ (VB − VTN ) n 2n ⋅ v2 − ⋅ v22 − Sn ⋅ K pn ⋅ R ⋅ (VB − VTN ) n Sn ⋅ K pn ⋅ R ⋅ (VB − VTN )2 ⋅ v1 (2.6) n Similarly, the output node voltages vb through vd for combiners 2 through 4 also have the same expression shown in equation (2.6) except that the sign of the terms containing v1 and v2 change based on the phase of the signal that is applied to the corresponding combiner as shown in Fig. 2.2. The expressions for the multiplier output currents iop and iom are then iop = iom = Sn ⋅ K pn 2n S n ⋅ K pn 2n [va2 + vb2 − 2VTN ⋅ (va + vb ) + 2VTN 2 ] (2.7a) [vc2 + vd2 − 2VTN ⋅ (vc + vd ) + 2VTN 2 ] (2.7b) The differential output current of the analog multiplier iout is thus obtained by subtracting (2.7b) from (2.7a), and after some algebraic manipulation one obtains the following expression iout = K mult ⋅ v1 ⋅ v2 (2.8) 14 where K mult ⎡⎛ S ⋅ K n pn = 4 ⎢⎜⎜ n ⎢⎝ ⎣ 3 ⎤ ⎞ 2 2⎥ ( ) R V V ⋅ ⋅ − ⎟⎟ B TN ⎥ ⎠ ⎦ The multiplier gain constant, Kmult, is strongly dependent on process parameters. The value of Kpn changes by tens of percent from one process corner to another as does the sheet resistance of the material used to implement the resistance, R. Moreover, both of these parameters display significant temperature dependence. The expression in equation (2.8) for Kmult may be re-written as follows K mult ⎡ 2VR ⎤ =⎢ ⎥ ⎣ (VB − VTN ) ⎦ 2 ⎛ S n ⋅ K pn ⎜⎜ n ⎝ ⎞ ⎟⎟ ⎠ (2.9) where VR is the DC voltage drop across the resistor, R,. Equation (2.3) was used to compute the quiescent current through the resistor. We observe in equation (2.9) that the first term is associated with the “first-stage” combiners shown in Fig. 2.2, while the second term is due to the “second-stage” combiners. Little can be done to remove the variation in the multiplier gain constant due to parameters associated with the second-stage combiners, but it is possible to remove much of the process dependence associated with the “first-stage” combiners. In the following section, we propose a modification to the combiner circuit in Fig. 2.3 which addresses this issue. 15 Improved Analog Multiplier Design As demonstrated in the previous section, the gain of the multiplier depends strongly on the DC voltage across the resistor. Unfortunately, this voltage is highly sensitive to process variations. If this voltage could be made independent of process, the gain of the multiplier, Kmult would be less dependent on process corner. In order to reduce the sensitivity of Kmult to variations in processes and to yield a design that is free of resistors, the resistor used in the “first-stage” combiner was replaced with a PMOS transistor, M3, operating in the resistive region. Also, use of a PFET to realize the resistor, R, saves area. The modified “first-stage” combiner is shown in Fig. 2.4. +VDD Vctrl M3 Vout Vin1 M1 M2 Vin2 Fig. 2.4 Improved combiner circuit For a PFET biased in the resistive region [All:03], Req can be calculated as follows Req = n K pp ⋅ S p ⋅ (VDD − Vctrl − | VTP |) (2.10) 16 where Vctrl is the gate control voltage, VTP is the threshold voltage of the PFET, Kpp is the transconductance parameter of the PFET, and Sp is the shape factor. While in general there are linearity concerns when FETS are used to simulate resistances, those concerns are mitigated here by the fact that the analog multiplier design of Fig. 2.2 is fully differential and because intermediate levels of distortion do little to interfere with the ability of the comparator in Fig. 2.1 to discriminate zero crossings. By adjusting the control voltage, Vctrl, the resistance can be altered and in turn the DC voltage across device M3 tuned to the desired value (i.e. the value which produces the desired Kmult). In order to make this “tuning” automatic, an additional combiner was added to the design as illustrated in Fig. 2.5. +VDD M21 M20 IBias Vctrl IB - OTA + M19 VB M17 M18 VB Fig.2.5 Automatic gain control circuit for resistive PFET 17 Transistors M17 and M18 are the same size as the NFETS (M1 and M2) used in the combiner circuit from Fig. 2.4. M19 matches PFET M3. The OTA (Operational Transconductance Amplifier) is of the symmetric Miller type [Lak:94]. Devices M20 and M21 are matched. They implement a voltage divider. The voltage on the inverting input is half of the supply voltage. As a result of negative feedback, the OTA adjusts its output until the voltage at the non-inverting input is also VDD/2. The voltage, Vctrl, on the gate of M19 is then distributed to the gates of the resistive PMOS devices in both analog multipliers depicted in Fig. 2.1. For simplicity and considering of the limitations resulting from the use of a 1 Volt power supply, we chose a symmetric Miller OTA design. The circuit schematic is given in Fig.2.6. Fig.2.6 Symmetric Miller OTA 18 A differential pair is formed by the matched input transistors, M1 and M2. M3 and M4 are load devices, matched as well, each of which constructs a current mirror with their counterpart devices M5 and M6. The output current of transistor M5 is then mirrored once again by the current mirror formed using devices M7 and M8. Finally, the output voltage, Vout is sum of the small-signal currents in M6 and M8 multiplied by the parallel combination of the small-signal output resistances ro6 and ro8. All current levels are determined by the bias current IBIAS, which is 10μA. High-Speed Comparator Design The high-speed comparator (adapted from [All:03]) illustrated in Fig. 2.7 is used to convert the differential sinusoidal current output from the multipliers into a square-wave clock output. Transistor pair M5-M6 form a current mirror as do transistors M7-M8. The crosscoupled transistors M2-M3 implement a high-speed NMOS latch. Diode-connected transistors M1 and M4 provide hysteresis and help to establish the common-mode input voltage for the Fig. 2.7 High-speed analog comparator 19 self-biased differential amplifier formed by devices M9 through M14. The output of the selfbiased differential amplifier drives a pair of cascaded push-pull output drivers comprised of FETS M15-M18 to guarantee a rail-to-rail clock output. SR Latch and OR Gate Design The SR latch in Fig. 2.1 is constructed by two cross-coupled NAND gates. The NAND gate is implemented using a fully complementary logic structure. Since the inputs of the OR gate can be set to active low by means of connecting them to the inverting outputs (Q_bar) of the SR latches the OR gate is simply replaced by a NAND gate. Using only NAND gates simplifies the layout of the clock generator. Fig. 2.8 illustrates the SR latch and the NAND gate used in the design of the latch. VDD InA S M1 M2 Out Q Q_bar InB InA M3 InB M4 R SR_Latch 2-Input NAND Gate Fig. 2.8 SR latch circuit 20 The function of the SR latches is to latch the ‘Init!’ or ‘Done!’ signals. Whenever the active low ‘Done!’ signal is applied to the ‘R’ input nodes of the SR latch it resets the latch. The ‘hold’ signal (See Fig. 2.1) will stop the clock output. In the same way, the clock output will restart if either of the ‘Init!’ signals that are tied to the ‘S’ input node of the SR latch pulses low. Track and Hold Circuit Design The track and hold circuit (T/H) is composed of a simple CMOS transmission gate with the input capacitance of the analog multiplier input serving as the tracking capacitor. Two cascaded inverters are used to supply the switching signals for the transmission gate. Hold_Delay The T/H circuit schematic is illustrated in Fig. 2.9. Hold M2 M3 Hold_ Delay In M4 M5 Fig. 2.9 Track and hold circuit Out M6 Hold_Bar M1 Hold_Bar VDD 350pF 21 The T/H circuit will be set into the “Track” mode as long as the ‘Hold’ signal is low. During this time the input sinusoidal signal can pass through the transmission gate (delayed by a small amount in time). Whenever the ‘hold’ signal transitions from low to high, the T/H circuit goes to the “Hold” mode that will result in the output of the transmission gate retaining the analog value present at the input to of the T/H a short time before the ‘Hold’ signal went high . 22 CHAPTER 3 NON-IDEAL EFFECTS In this chapter, we consider effects which lead to the multiplier displaying non-ideal behavior. First, we look at the effect of channel length modulation upon multiplier performance. We then investigate the effect of process parameter variations on expected performance by performing a sensitivity analysis. We finally consider how mismatch and offset affect the overall performance of the clock generator. Since the results of the offset analysis played a significant role in determining the final transistor sizes, we conclude the chapter with information regarding the size of the devices eventually selected. Channel Length Modulation The I-V characteristic of a FET given in equation (2.3) is ideal and does not account for the effects of channel length modulation. As drain voltage increases (in excess of the saturation voltage), the effective channel length of a FET decreases resulting in an increase in current [All:03]. This effect can be accounted for in equation (2.3) with the addition of the factor (1 + λvDS) where vDS is the drain-to-source voltage and λ is the channel length modulation factor (inversely proportional to the length of the device, L). When the effects of channel length modulation are included, the expression for the multiplier gain, Kmult, in equation (2.9) changes slightly. The expression becomes K mult ⎡ ⎤ 2VR =⎢ ⎥ ⎣ (VB − VTN )[1 + λ ⋅ (VDD − VR )] ⎦ 2 ⎛ Sn ⋅ K pn ⎜⎜ n ⎝ ⎞ ⎟⎟ ⎠ (3.1) 23 In Table 3.1 we compare results for the analog multiplier obtained using the equations derived in this thesis and computed using MathCAD® with those obtained through electrical simulation using Cadence’s Spectre® simulator. One observes that if λ is included, then the analytical predictions agree closely (within 5%) with the results obtained from electrical simulations. IReq is the DC drain-to-source current of PFET M3 in the multiplier’s first-stage combiners, VBO1 is the DC output voltage for the first-stage combiners, and Iout is the peak-topeak differential current transferred to the NMOS latch in the comparator. For these simulations VB was 570 mV, and the peak-peak amplitude of the quadrature inputs was 400 mV. The value for VR was VDD/2 i.e. 0.5 Volts. TABLE 3.1 Comparison of simulation results Mathcad With λ Without λ Electrical Simulation Output of the multiplier’s firststage combiner IReq 680 μA 604 μA 706 μA VBO1 0.5 V 0.5 V 0.49 V Peak-to-peak output Iout 1.49 mA 1.59 mA 1.53 mA Sensitivity Analysis Since the multiplier gain constant, Kmult, is strongly dependent on process parameters (See equation 2.8 and 2.9), the current output of the multiplier varied from the nominal value as simulations were run at different process corners. In Chapter 2 an improved analog multiplier design was introduced which made the design (we claimed) less sensitive to process corner. In this section of the thesis we explore this claim in a quantitative way and 24 compare the performance of the “improved” multiplier with the “initial” design by using the method of sensitivity analysis. In equation (2.8) the threshold voltage, VTN, the transconductance parameter, Kpn, and the resistance, R, are subject to process variations. They change across process corners and result in the multiplier’s output current deviating from the nominal value. Using a sensitivity analysis we can easily obtain the fractional or relative gain variation, ΔKmult / Kmult given by ΔK pn ΔI out ΔK mult VTN ΔV ΔR = = 2⋅ + 3⋅ − 2⋅ ⋅ TN I out K mult R K pn VB − VTN VTN (3.2) where the expression of each Δ term represents the parameter’s variation. For example, ΔR, represents the difference between the value, Rsim simulated in a specific process corner and the typical value, Rtyp. Resistor, R is implemented using N+ diffusion. For the improved analog multiplier, similarly the relative gain variation is shown below but there is no ΔR/R term since the real resistor is replaced by a resistive PFET, and VR is treated as constant due to the existence of the bias circuit. ΔI out ΔK mult ΔK pn 2 ⋅ VTN ΔVTN = = + ⋅ I out K mult K pn VB − VTN VTN (3.3) Table 3.2 indicates the absolute value and relative variation of each process parameter across different corners. The expected values of these variations were obtained through a series of simulations. The measurement of the threshold voltage VTN and transconductance Kpn was found using simulation and a linear regression analysis. 25 TABLE 3.2 Simulation results of parameter values across process corners Parameters and Relative Variation Process Corners R (Ω) ΔR/R Kpn (μA/V2) ΔKpn/Kpn VTN (V) ΔVTN/VTN Typical 741 0 430 0 0.164 0 Best 823 11% 457 6.3% 0.138 -15.9% Worst 659 -11% 421 -2.1% 0.204 24.4% In Table 3.3 we compare results predicted using sensitivity analysis with the simulated results for three corners for the “initial” and “improved” analog multiplier. The “Typical” corner is when nominal FET and resistor parameter values in the models are used. We defined the “Best” corner as the process corner that yielded the largest differential current (and consequently the highest operating speed) when the “initial” multiplier design was used. The “Worst” case corner is when parameters are chosen that yield a significant decrease in differential current (and consequently the lowest operating speed One can see in Table 3.3 that as predicted, the use of the “improved” multiplier design resulted in much less variation in the differential current. The agreement between the results predicted by the analysis and by the simulator is good but not perfect. This is not surprising and merely confirms the fact that parameters other than those explicit in the sensitivity analysis contribute to variations in the differential current. It is interesting to note that the variations in the differential current are actually smaller than what was predicted by the analysis. Clearly, the variations observed when the “improved” design is used are about one order of magnitude smaller than when the “initial” design is employed. 26 TABLE 3.3 Estimated and simulated results in initial and improved multipliers Initial analog multiplier Process Corners ΔI out _ est Improved analog multiplier ΔI out _ sim ΔI out _ est I out _ sim I out _ est Peak-Peat Iout_sim (mA) ΔI out _ sim I out _ est Peak-Peak Iout_sim (μA) Typical 0 391 0 0 720 0 Best 54% 491 26% -6.5% 724 0.5% Worst -48% 167 -57% 17.6% 698 -3.1% I out _ sim Mismatch and Offset Analysis It is important that random offsets due to mismatch in transistor parameters t be considered. Clearly, a differential offset current, resulting from mismatches, delivered to the NMOS latch in the comparator of Fig. 2.7 will result in the clock’s duty cycle differing from the ideal fifty percent. In fact, if the offset current becomes larger than the peak differential output current, the clock becomes stuck at one logic level. If the duty cycle is to be in the acceptable range of 40 – 60 percent, then the differential offset current must be less than 25% of the differential output current’s peak amplitude. In other words, based on the value in Table I, the differential offset current should not exceed 190 μA. In equation (2.3) the transistor parameters VTN, Kpn, Wn, and Ln should be treated as independent Gaussian random variables. It is easy to show that when sensitivity techniques are applied to (2.3), the current predicted by (2.3) will be Gaussian distributed, with mean value, IDS, and the variance given by 27 2 σ I DS 1 = g m12 2 ⋅ σ VTN + I DS 12 ⎛σK 2 σ 2 σ 2 Wn Ln pn ⋅⎜ + + ⎜ K pn 2 Wn 2 Ln 2 ⎝ ⎞ ⎟ ⎟ ⎠ (3.4) where the σ2 are variances associated with the respective variables, gm is the small-signal transconductance of the NFET, and IDS is the mean DC drain-to-source current. The subscript 1 highlights the association of the variables with the first-stage combiner. For the first-stage combiner in Fig. 2.4, the variance of the output voltage may then be written as σ VO1 2 = 2 Req 2σ I DS 1 2 + ( 2 I DS1 ) σ Req 2 2 (3.5) where σReq2 is the variance associated with the resistance, Req, implemented by the resistive PFET, M3. Using the equation (2.10) and a similar sensitivity technique, one may write σ Req 2 ⎛ σ 2 σ K 2 σW 2 σ L 2 V pp p p = R ⋅ ⎜ TP 2 + + + 2 2 ⎜ VSAT K pp Wp Lp 2 ⎝ 2 ⎞ ⎟ ⎟ ⎠ (3.6) 2 where VSAT is the saturation voltage of the PFET M3, σ VTP is the variance of the threshold voltage of M3, and σKpp2 is the variance of the transconductance parameter for the PFET. Applying an equation similar to equation (3.4) to the second stage combiners, the variance of the differential offset current can be written as σ Iout 2 ⎛σK 2 σ 2 σ 2 W L pn = 8 ⋅ g m 2 2 ⋅ (σ VTN 2 + σ VO1 2 ) + 8 ⋅ I DS 2 2 ⋅ ⎜ + n2 + n2 2 ⎜ K pn Wn Ln ⎝ ⎞ ⎟ ⎟ ⎠ (3.7) 28 As before the subscript 2 is used to make explicit the variable’s association with the second-stage combiner. The factor of 8 is present in equation (3.7) because there are a total of 8 FETS in the second-stage combiners that comprise the two multipliers illustrated in Fig. 2.1. The standard deviation in threshold voltage is computed using the standard threshold matching equation below σ VT = 1 2 ⋅ M VTA (3.8) AFET where M VTA is an empirically determined (foundry supplied) matching coefficient, and AFET is the gate area of the device under consideration. Similarly, the standard deviation in transconductance parameter is computed using the equation as follow σ Kp = 1 2 ⋅ M K pA AFET ⋅Kp (3.9) where M K PA is the matching coefficient for the transconductance parameter. In both instances, matching is improved as gate area is increased. Device Sizing Based on considerations associated with the aforementioned effects, appropriate device dimensions were selected for the three transistors comprising the combiner. The use of very short transistor lengths was avoided because (through simulation) it was observed 29 that device threshold voltages increased significantly as the channel length decreased. We attribute this to “reverse short channel effects”. Halo implants (most likely used in the 90 nm process under consideration) increase the average doping concentration in the channel so that the threshold voltage increases significantly if minimal lengths are used. Increasing the length of a FET also reduces λ. Moreover, increased gate area helps reduce the differential offset current. For simplicity, the NMOS transistors in both first- and second-stage combiners were sized identically. The size of the devices used in the combiner (refer to Fig. 2.4) is presented in Table 3.4 where ‘M’ is the multiplicity value (number of FETs in parallel). TABLE 3.4 Combiner Combiner FET sizes Ref W (μm) L (μm) M M1 6 1 2 M2 6 1 2 M3 32 1 2 Recall that device M3 is the PFET operating in the resistive region which simulates the resistance, R, in the original combiner circuit of Fig. 2.3. The control voltage on the gate of M3 was chosen to be approximately 300 mV resulting in a saturation voltage in excess of 500 mV. A large width-to-length ratio was needed to realize a resistance of around 700 Ω. A 30 length of 1 μm was chosen in order to improve matching of the pseudo-resistors in the firststage combiners of the multiplier. In the original combiner design, the dimensions of the n+ diffusion resistor, R, were 10 μm by 78 μm. Clearly, a significant area savings was achieved by adopting the pseudo-resistor in the multiplier design. The results of the offset analysis from the previous section were used to calculate the expected differential offset current from the two multipliers. We possessed the necessary foundry data needed to compute the variances defined in the previous section. The standard deviation of the offset current, σ IOUT , was computed as 15 μA. The 6σ value (90 μA) is well below the upper limit of 190 μA which was needed to ensure a reasonable duty cycle for the output clock. The widths and lengths of the transistor of the bias circuit (Fig. 2.5) and associated OTA (Fig.2.6) in the “improved” analog multiplier are given in Table 3.5. In order to ensure good matching in the layout, the multiplicity (M) for all FETs is 2. This allows the person performing the layout to inter-digitate the devices to achieve better matching. The device sizes of the NAND gate included in SR latch (Fig. 2.8), and T/H (Fig 2.9) circuit are listed in Table 3.6. The ratio of the width of the PFET to the width of the NFET in the NAND gate is chosen to be 1.5 help equalize the rise- and fall-times. In addition, the shape factors of the FETS in the transmission gate in the T/H circuit were chosen to be large (lower ON resistance) so that the overall bandwidth of the clock generator will not be limited. 31 TABLE 3.5 OTA Bias Circuit Bias circuit and OTA FET sizes Ref W (μm) L (μm) M M1-M2 12 4 2 M3, M6 4.5 4 2 M7-M8 1.3 4 2 M9-M10 1.4 4 2 M10, M11 3.3 0.25 2 M17-M18 6 1 2 M19 32 1 2 M20-M21 1 2 2 TABLE 3.6 T/H NAND NAND gate and T/H circuit FET sizes Ref W (μm) L (μm) M M1-M2 6 0.2 2 M3-M4 4 0.2 2 M1, M3 6 0.2 2 M2, M4 2 0.2 2 M5 28.8 0.2 2 M6 9.6 0.2 2 32 The widths and lengths of the transistors in the analog comparator, illustrated in Fig. 2.7, are listed in Table 3.7 TABLE 3.7 Comparator Comparator FET sizes Ref W (μm) L (μm) M M1, M4 1.5 0.25 1 M2, M3 1.5 0.25 2 M5-M8 19.8 0.25 2 M9 15.3 0.25 2 M10, M11 3.3 0.25 2 M12, M13 0.95 0.25 2 M14 4.2 0.25 2 M15 3.00 0.50 1 M16 1.00 0.50 1 33 CHAPTER 4 SIMULATION RESULTS In this chapter, we report the results of simulations performed on the restartable clock generator and its sub-circuits. Electrical simulations were performed using Cadence’s Spectre® simulator. The BSIM4 model was used for the FETs. The power supply voltage was 1 Volt. The frequency of the quadrature inputs was 1 GHz and the DC level of the inputs (VB voltage) was 570 mV. The peak-to-peak amplitude of the input signal was 400 mV. The bias current for the OTA was 10 μA. Analog Multiplier Simulation The differential current output and one of the four quadrature sinusoidal inputs are illustrated in Fig. 4.1. The peak-to-peak amplitude of the current is approximately 700 μA (See Table 3.3). No obvious distortion is present. The output frequency is 2 GHz, twice the input frequency, which can be easily proved by using equation (2.8). Frequency Response of Symmetrical Miller OTA The gain and phase plots for the symmetric Miller OTA are presented in Fig. 4.2. The low frequency open-loop gain is close to 32 dB, the Gain Bandwidth product (GBW) is about 5.7 MHz and the phase margin is 60° approximately (180° - 120°). Since the OTA is used to stabilize the DC level only a small GBW is required. The low gain (32 dB) explains why the simulated value for VBO1 in Table 3.1 deviates slightly from its expected value. In order to increase the open loop gain a cascode structure could be employed in the OTA design, but this is difficult because of the small supply voltage (1 Volt). 34 Iout Asinwt Time (ns) Fig. 4.1 Analog multiplier simulation Fig. 4.2 Frequency response of symmetric Miller OTA 35 Hold Signal Verification The correctness of the hold signal is verified in simulation, illustrated in Fig. 4.3. When either one of ‘Init!’ signals is asserted, the output of the OR gate (‘Hold’ signal) goes high and forces the T/H circuit into “hold” mode. If the ‘Done!’ signal is applied, then both SR latches are reset. The ‘Hold’ signal changes to a low level and forces the T/H circuit into “track” mode. Fig. 4.3 Hold signal verification 36 Track and Hold Circuit Simulation Fig. 4.4 depicts one of the quadrature sinusoidal inputs passing through the T/H circuit. Clearly, the output tracks the input when the ‘Hold’ signal is low. When ‘Hold’ transitions high the analog voltage at the input is held at the output of the T/H. The voltage is changed by a small amount due to “clock feedthrough” and “charge injection” effects [All:03]. The bandwidth of the T/H circuit also has been simulated and is shown in Fig. 4.5. The -3dB bandwidth is 3.2 GHz. This value is large enough so as not to degrade the overall bandwidth of the clock generator. Input Hold Output Time (ns) Fig. 4.4 Transient response of T/H circuit 37 Magnitude (3.2GHz, -3dB) Freq (Hz) Fig. 4.5 Frequency response of T/H circuit Restartable Clock Generator Simulation Correct operation of the clock generator is illustrated in Fig. 4.6. The sine wave input in the figure is one of the four quadrature inputs applied to the multiplier. The square wave output, which serves as the clock to a data processing subsystem in our blended design methodology, is started whenever the ‘Init!’ signal is pulsed low and is stopped when the ‘Done!’ signal is pulsed low. It is restarted with the next ‘Init!’ signal. The duty cycle of the clock is near 50% which is expected since no random offsets which would alter the duty cycle were modeled in the simulation. As described in Chapter 3 since the multiplier is fully differential, its linearity is inherently quite good. In Fig. 4.6, Iout is the differential current delivered to the NMOS latch 38 in the comparator. Note the absence of any significant distortion in this differential current waveform. Moreover, the waveforms in the figure change very little as the process corner is changed because of the technique which we used to stabilize the gain of the multipliers. Asinwt Init ! Done ! Differential Current Clock Time (ns) Fig 4.6 Clock generator simulation In Fig. 4.7 the delay in starting the clock, measured from the falling edge of the ‘Init!’ signal to the rising edge of the output clock, is plotted. The graph is generated by uniformly distributing the arrival of the ‘Init!’ signal over one period of the reference input sinusoid. One can see from the figure that the average delay in restarting the clock is approximately 1.5 ns. This value includes the delay inserted by the SR latch, the OR gate, track and hold circuits, analog multipliers, and the comparator. The peak-to-peak variation in the time required to restart the clock is 120 psec. 39 Fig. 4.7 Delay associated with restarting the clock 40 CHAPTER 5 CONCLUSIONS AND FUTURE WORK Conclusions Synchronizers are the accepted method for communication between clockless and clocked domains, but they present metastability hazards [Cha:73]. These hazards can be effectively mitigated, but not eliminated by proper synchronizer design [Stu:79]. However, each such design is unique to a specific IC process. A new design presented in this thesis, the “restartable” clock, is based on a fixed-period source eliminating the variation in periodicity, characteristic of pausible clocks presented in the literature. However, like a delay-based clock the restartable clock can be stopped and then restarted at an arbitrary phase of the source. This restartable clock completely eliminates metastability hazards and can be scaled with feature size across IC processes without introducing metastability. The proposed restartable clock can be based on an external crystal oscillator or on a local all-silicon, MEMS-based oscillator. The later approach has the advantage of eliminating global clock trees and their dissipation, suppressing clock noise, and reducing power consumption during subsystem idle time. Furthermore, the thesis demonstrates that it is possible to design a plausible clock based on an external crystal oscillator or MEMS-based oscillator that is fully compatible with modern deep sub-micron technologies. The circuit described here, implemented in a state-ofthe-art 90 nm process, only makes use of regular threshold voltage FETS, operates from a single 1 Volt power supply, and is insensitive to process variations, In addition the clock generator circuit exhibits low power dissipation while able to operate at frequencies as high 41 as 1 GHz and with a latency of 1.5 clock periods. Higher frequency operation is possible but at the expense of both power dissipation and area. Future Work Monte Carlo simulations to confirm the results presented in Chapter 4 predicting the likely offset current will be performed in the future. Also, the clock generator requires quadrature signals, the generation of which was not addressed in this thesis. Future work will tackle the task of efficiently generating these signals from an external crystal oscillator or MEMS-based clock. 42 REFERENCES [All:03] Phillip E. Allen, Douglas R. Holberg, CMOS Analog Circuit Design (2nd Ed.), Oxford University Press, pp. 477-487, 2003. [Ami:07] Esmail Amini, Mehrdad Najibi and Hossein Pedram, “FPGA Implementation of Gated Clock Based Globally Asynchronous Locally Synchronous Wrapper Circuits” 2007 [Bai:03] W. J. Bainbridge, W. B. Toms, D. A. Edwards, S. B. Furber, “DelayInsensitive, Point-to-Point Interconnect using m-of-n Codes,” Proceedings of the Ninth International Symposium on Asynchronous Circuits and Systems (ASYNC.03), pp 132-140, May 2003. [Bei:08] E.Beigné, F. Clemidy, S. Miermont, and P. Vivet, Dynamic Voltage and Frequency Scaling Architecture for Unit Integration within a GALS NoC, In proc Second ACM/IEEE International Symposium on Networks-on-Chip, page 129-138, Apr.2008 [Cha:73] T. J. Chaney, C. E. Molnar, “Anomalous behavior of synchronizer and arbiter circuits,” IEEE Trans. on Computers, C-22, pp.421-422, 1973. [Cha:84] D.M. Chapiro, “Globally-Asynchronous Locally-Synchronous Systems”, Ph.D. thesis, Stanford University, Oct. 1984. [Cla:67] W. A. Clark “Macromodular computer systems,” Spring Joint Computer Conf., AFIPS Proceedings, 30, Thompson Books, Washington D.C., pp. 335336, 1967. [Cla:74] W. A. Clark, C. E. Molnar, “Macromodular computer systems,” Computers in Biomedical Research, 4, R. Stacy and B. Waxman, Eds. Academic Press, New York, pp 45- 85, 1974. [Cox:88] J.R. Cox, “Can a crystal clock be started and stopped?” Applied Mathematics Letters, Vol. 1, Pergamon, pp 37-40, 1988. [Cox:05] J. R. Cox and D. M. Zar, “Blending clockless and clocked subsystems using macromodular control elements,” WU CSE Tech. Report, 2005-43. [Cox:07] J.R. Cox, G.L. Engel, D.M. Zar “Design of Instantaneously Restartable Clocks And Their Use Such As In Connecting Clocked Subsystems Using Clockless Sequencing Networks”, United States Patent #7,243,255; July 10, 2007. 43 [Deb:02] Carl James Debono, Franco Maloberti, and Joseph Micallef, “On the Design of Low-Voltage, Low-Power CMOS Analog Multipliers for RF Applications”, IEEE Trans. Very Large Scale Integr., Vol. 10, No. 2, pp 168-174, April 2002. [Dob:99] Rostislaav Dobkin, Ran Ginosar, and Christos P. Sotiriou, “High Rate Data Synchronization in GALS SoCs”, IEEE Trans. Very Large Scale Integr., Vol. 14, No. 101, pp. 1063-1074, December 1999. [Gür:06] Frank K. Gürkaynak, Stephan Oetiker, Hunbert Kaeslin, Norbert Felber, and Wolfgang Fichtner ,“GALS at ETH Zurich: Success or Failure?” In Proc. International Symposium on Advanced Research in Asynchronous Circuit and Systems, page150-159 Mar.2006 [Hsi:98] S.-Y. Hsiao and C.-Y. Wu, “A parallel structure for CMOS four-quadrant analog multipliers and its application to a 2-GHz RF downconversion mixer,” IEEE J. Solid-State Circuits, vol. 33, pp. 859-869, June 1998. [Lak:94] Kenneth R. Laker and Willy M. C. Sansen, Design of Analog Integrated Circuits and Systems, McGraw-Hill Press, pp219-221, 1994. [Mob] Mobius Microsystems: web site: http://www.mobiusmicro.com [Sei:80] C. L. Seitz, “System timing,” In: C. A. Mead, L. A. Conway, Eds, Introduction to VLSI systems, Chapter 7,. Addison-Wesley, 1980. [Stu:79] M. J. Stucki and J. R. Cox, “Synchronization strategies,” Proceedings of Caltech Conference on VLSI, pp. 375-393, Jan. 1979. [Tal:05] S.K. Tallapragada, “Design of a Restartable Crystal Controlled Clock for Use in a Globally Asynchronous, Locally Synchronous Design Methodology” Master Project Report, Southern Illinois University Edwardsville, December 2005. [Tra:91] Cherrice A. Traver, “A Testable Model for Stoppable Clock ASICS”, ASIC Conference and Exhibit, 1991. Proceedings., Fourth Annual IEEE International, pages 6-3/1-5, Sept. 1991 [Ung:69] Stephen Unger, Asynchronous sequential switching circuits, WileyInterscience, pp.177-183, 1969. [Yun:99] Kenneth Y. Yun and Ayoob E. Dooply, “Pausible Clocking-Based Heterogeneous Systems”, IEEE Trans. Very Large Scale Integr., Vol. 7, No. 4, pp 482-488, December 1999.