Download High-speed Digital Architectures

High-Speed Digital Architectures Chris Allen ([email protected]) Course website URL people.eecs.ku.edu/~callen/713/EECS713.htm 1 Overview Topics include • Pipelining • Latency • Demultiplexing • Multiplexing • Clock fanout and distribution • Clock skew and fine timing adjustments • Clock signal sources 2 Logic devices and high-speed designs Pipelining & latency Consider the multi-bit adder, A + B We want to add two 18-bit binary numbers (unsigned binary) A17 A16 … A1 A0 + B17 B16 … B1 B0 A0 and B0 are the least significant bits (LSBs) A17 and B17 are the most significant bits (MSBs) How fast can we add two 18-bit numbers? A 6-bit ECL adder we be the building block for this design Inputs A5:A0 B5:B0 Cn (carry input) Outputs F5:F0 G (carry output) A: 0 to 31 F: 0 to 31 B: 0 to 31 Cn: 0 or 1 G : 0 or 1 Cn+ A + B (maximum case)  F = 63, G = 0 3 Pipelining & latency 18-bit adder Consider the propagation delay (typical) An, Bn, Cn  Fn : 3 ns An, Bn, Cn  G : 2.5 ns Note: The adder is combinational logic, not sequential, there is no clock signal How to find T Identify critical path (longest delay) 5:0 A, B  F A, B  G 3 ns 2.5 ns 11:6 A, B  F Cn  F A, B  G 3 ns 5.5 ns 5 ns 17:12 A, B  F Cn  F A, B  G 3 ns 8 ns 7.5 ns For this configuration, the output is stable after 8 ns  125 MHz is the max rate for this 18-bit adder A greater number of bits (e.g., 36-bit adder) would further increase the delay, reducing the add rate 4 Pipelining & latency Multiplexed 18-bit adder twice as much hardware to produce results twice as fast A clock signal has been added to synchronize the registers In this configuration, the adding rate is 250 MHz and the latency is 8 ns Propagation delay for each channel is still 8 ns 2:1 multiplexing cuts time in half  4 ns (250 MHz) 4:1 multiplexing cuts time by four  2 ns But the latency is still 8 ns 5 Pipelining & latency Pipelined 18-bit adder While a single 6-bit add takes only 3 ns (333 MHz), the propagation of the carry bit slows the 18-bit addition to 8 ns (125 MHz) A pipelined architecture allows the adder to operate with a 3-ns add time plus 0.5 ns for setup and propagation through the register  3.5 ns cycle time (286 MHz) This scheme is expandable to N-bit adds with same rate In this configuration, the adding rate is 286 MHz and the latency is 6 clock cycles or 21 ns 6 Pipelining & latency Pipelining & latency The price to be paid for achieving this speed is • Circuit complexity • Data latency Some applications can tolerated large latency Examples include one-way data transfers such as TV broadcast signal Other applications cannot tolerate much latency Examples include two-way data exchange such as voice communications (calls via satellite have latency of ~ 0.5 s) Techniques to further speed up the add process If a 6-bit add takes 3 ns a 2-bit add should take ~ 1 ns a 1-bit add should take ~ 0.8 ns Theoretically could do adds as fast as 1.5 ns (667 MHz) with 18 add stages and 36 clock cycles of latency Note also that this approach requires a large number of clock signals (not shown) 7 High-speed digital design examples Consider a data acquisition (DAQ) system Analog signals are digitized and recorded • Example applications include – oscilloscopes, radar receiver The maximum bandwidth of the acquired signal is limited by the ADC clock frequency The precision of the digitized signal is limited by the number of bits in the ADC The length of the data record is limited by the memory size (2N) WE: write enable 8 High-speed digital design examples Consider an arbitrary waveform generator (AWG) system Analog waveforms are produced from stored digital records • Example application – radar waveform generation The maximum bandwidth of the output signal is limited by the DAC clock frequency The precision of the digitized signal is limited by the number of bits in the DAC The length of the waveform is limited by the memory size (2N) In both cases, the maximum data vector size is X x 2N i.e., X-bit wide word, 2N word vector length 9 High-speed digital design examples Data acquisition system Consider the case where X = 8 bits, N = 16  64k word vector 1-GHz clock rate, maximum record length is 65.5 s DAQ high-level timing Within the 1-ns clock period • The acquired data must stablize • The memory must be addressed • The Write Enable line must toggle All in compliance with the memory’s timing requirements • Setup and hold times for Data and Address relative to Write Enable Key to the DAQ operation are the address generator and the memory This design requires a 16-bit synchronous counter with preset inputs – Not a ripple counter The Addr_CLK must be the system clock (1 GHz) The memory write cycle time < 1 ns It is difficult to achieve the required timing with available technology 10 High-speed digital design examples The timing requirements can be relaxed with demultiplexing The SPT7760 ADC has an integrated 1:2 demux that reduces the effective output data rate (per channel) to 500-MSa/s 11 High-speed digital design examples The ADC’s 1:2 demux doubles the memory’s write time to 2 ns Consider a design using the following devices: 8-bit, 1 GSa/s ADC with 1:2 demux 700 MHz, 8-bit sync counter 1k x 4 RAM with 5-ns write cycle time Since the RAM’s 5-ns write cycle time > ADC’s 2-ns demux’d update time further demultiplexing is required A 4:1 demux will reduce the data rate to 8 ns a rate the RAM can accommodate One cost of this approach is the added complexity both in terms of added hardware and in terms of signal formatting for output 4:1 DEMUX 12 High-speed digital design examples High-level timing for system with 2:1 ADC demux and 4:1 demux on PCB 13 High-speed digital design examples Just as demultiplexing relaxed the DAQ timing requirements, multiplexing eases the arbitrary waveform generator’s timing challenges 4:1 MUX A 4:1 mux will reduce the data rate from each memory device by 4 14 High-speed digital design examples Integrating a multiplexer in the digital-to-analog converter allows the converter to operate at higher rates Integrated 1:2 Mux 15 High-speed digital design examples General design rules for these high-speed applications • Keep uniform line lengths within a data bus to ensure constant signal latency • Keep analog signal lines away from digital lines digital lines contain significant broadband ‘noise’ that can degrade the analog signal through crosstalk • Clock signal distribution design is critical to achieve maximum operating speed • Jitter in the clock signal (due to clock generator circuit) will result in phase noise in the data 16 Clock signal issues Clock signals provide a time reference for the entire system Issues to consider regarding clock signals Clock fanout and distribution Clock skew and fine timing adjustments Clock division: fCLK/2, fCLK/4, … Clock signal generation Clock fanout Consider case where multiple registers must be clocked simultaneously However the fanout limit of the technology is ~ 5 (3 to 10) Clock fanout buffers Intended to provide multiple copies of the clock signal with equal latencies 17 Clock skew Clock skew describes when timing signals arrive at different components at different times Possible causes include Clock buffer skew Mismatched trace propagation delay Capacitive loading or coupling Clock buffer skew Gate-to-gate skew: 20 ps (typ), 50 ps (max) 18 Clock skew Even with low-skew clock buffers, some clock skew will remain Timing variations can compound as devices are cascaded leading to increaed uncertainty Impact? System timing variations  reduced timing margin How to compensate for clock skew? For critical timing applications, we can employ delay adjustments Delay line (passive) delay depends on length Gate delay (active) delay depends on gate characteristics Example Consider two clock (or data) lines we wish to synchronize using delay line variations By changing jumper connections can make tB < tA or tB = tA or tB > tA 19 Clock skew Similar schemes for varying signal delay. 20 Clock skew Problem Jumpers may cause impedance mismatch  reflections Using surface mount strips close the gaps helps reduce mismatches Problem Occupies significant board area Hard to implement at chip level or in MCM Implement jumper selection electronically Consider implementing the variable delay with a simple gate (OR, XOR, AND, … 300 to 1500 ps) and a multiplexer The delay is controlled electronically by bits S0 and S1 S1 S0 0 0 0 1 1 0 1 1 F A B C D Delay 0 Tp 2Tp 3Tp 21 Clock skew Single-chip programmable delay lines available 22 Clock division Subharmonics of the clock signal (fCLK/2, fCLK/4, …) can be produced using simple flip-flops configured as clock frequency dividers The output signals have a 50% duty cycle regardless of the input signal’s duty cycle Shift registers can be used to divide the signal frequency by other integer multiples (know as ring counters or Johnson counters) Various duty cycles can be produced from these configurations Ring counter Johnson counter 23 Clock sources Clock signals are used to provide a timing reference Typically only one clock oscillator is used per system In computers, higher frequency signals may be derived from a single oscillator through frequency multiplication (e.g., PLL) In radar systems, the radar frequency, the A/D sample clock, and other timing and frequency signals are derived from a master clock oscillator (an exception would be the clock that drives the DSP which operates asynchronously from the rest of the system) Specifying the clock oscillator for digital apps, consider several parameters • • • • • • • Output voltage level (TTL or ECL, not sinusoidal with zero mean) Frequency (MHz, GHz) nominal operating freq @ nominal temp & voltage Stability (ppm) long-term frequency drift driven by temp, aging, voltage Rise/Fall time (ps) Waveform symmetry (%) may want to use CLK and CLK for split phase timing Environmental factors temperature range, shock/vibration Package DIP vs. SMT metal vs. plastic or ceramic 24 Clock sources Stability factors Temperature – quartz crystals used as resonant elements • Piezoelectric effect Resonance frequency determined by physical dimensions Temperature induces expansion/contraction  frequency changes Several varieties Non-compensated – large f / T Temperature compensated – less f / T Oven-controlled – T is constant Short-term frequency variations Characterized in terms of phase noise or timing jitter Phase noise refers to a random, uncorrelated clock-period variation • Introduces timing variations that reduce timing margin • Frequency multiplication amplifies the phase noise 25 Clock sources Various methods available to characterize clock jitter (phase noise) Spectral analysis An ideal clock signal has spectral energy at the fundamental and harmonic frequencies only Jitter (phase noise) causes a broadening of the spectral lines Power level below fundamental at f offset • For example, -50 dBc @ 100 kHz 26 Clock sources Converting jitter from measured phase noise W. Kester, “Converting Oscillator Phase Noise to Time Jitter,” MT-008 TUTORIAL, Rev. A, Oct. 2008, Analog Devices, Inc. jitterRMS   RMS o 27 Clock sources Delay line method of characterizing clock jitter Beat a sample of the clock signal with a delayed version of itself v1  cos  t  1  v 2  cos  t     2  Mixer produces  and  terms the LPF rejects the  term leaving v0 v 0  cos    1  2   cos      For fixed delay value, , and a stable  v0 varies as  changes To relate  to time jitter    Example, for fo = 300 MHz,  = 2 (35 mrad), jitter = 18.5 ps this value will vary with delay line length 28 Clock sources Example data sheet 29 Clock sources For testing purposes, it is useful to vary the clock frequency Finding the maximum operating clock frequency In laboratory testing we can use a variable clock generator (if you have one) Older versions have a maximum clock output frequency of 250 MHz However we can use standard laboratory oscillator (sinusoidal) if se set the amplitude to V (logic levels) and apply a DC bias = threshold voltage Example, with ECL and GaAs devices C1: AC couples the oscillator to the circuit RT: provides impedance matching and level shifting to VBB L: provides DC couples / AC blocks VBB from CLK C2: AC path for return current RT = Zo (50 ) C1, C2  (2 f C)-1 << RT (< 1 ) L  2f L >> RT (> 1 k) 30

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download High-speed Digital Architectures