Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Advanced Analysis, Design, and Measurement Techniques for Multi-Gb/s Data Links Frank O’Mahony (frank.o’[email protected]) Bryan Casper Circuit Research Lab, Intel Hillsboro, OR Outline • Overview of I/O trends • System-level link modeling – Worst-case data eye – Statistical data eye – Design example: 20Gb/s link • On-die measurement techniques 2 Chip-to-Chip Signaling Trends Decade Speeds Transceiver Features 1980’s >10Mb/s Inverter out, inverter in 1990’s >100Mb/s Termination Source-synchronous clk. Lossy transmission line 2000’s >1 Gb/s Pt-to-pt serial streams Pre-emphasis equalization noise Future >10 Gb/s Adaptive Equalization, Advanced low power clk. Alternate channel materials Lumped capacitance Transmission line … Channel Transmit Filter 3 h(t) Sampler S Linear Equalize r CDR Slicer CMOS transceiver data rates CMOS Transceiver Data Rates 100 • Plot showing link rate vs. year Data Rate [Gb/s] Power/channel limitations 10 1 0.1 0.01 0.1 1 Technology [um] Courtesy of Prof. Ken Yang, UCLA 4 10 Power Density Increases Exponentially! Rocket Nozzle Power density [Watts/cm2] 1000 Nuclear reactor Max power density envelope Pentium® 4 100 Pentium® III Pentium® II Hot plate 10 Pentium® Pro Pentium® i386 i486 1 1.5 1 0.7 0.5 0.35 0.25 0.18 0.13 Process Technology node [μm] 5 0.1 0.07 6 Teraflops Research Chip 100 Million Transistors ● 80 Tiles ● 275mm2 • First tera-scale programmable silicon: – – – – Teraflops performance Tile design approach On-die mesh network Power-aware capability • Tera-scale many-core μP’s will drive aggregate I/O rates aggressively 7 Power efficiency and process technology • Process scaling enables lower power data links 1000 Erg/bit [pJ] 100 • Channel characteristics can limit achievable power efficiency y = 399.17x1.157 10 Erg/bit (2-PAM) Erg/bit Trend (2PAM) 1 0.01 0.1 Process technology [um] Courtesy of Prof. Ken Yang, UCLA 8 1 I/O Data Rate and Power Efficiency Power Efficiency (mW/Gb/s) 60 40 20 7.5 J. Wong VLSI’03 R. Palmer ISSCC’07 Prete ISSCC’07 5 10 Data Rate (Gb/s) 9 BNV ISSCC’06 2.2 0 0 11.7 9.6 15 20 Designing power-efficient multi-Gb/s links • Accurate system-level link modeling – Careful statistical accounting of all noises • ISI, Xtalk, voltage, and timing noise • Power-efficient I/O system implementation – Design within the BW of the process technology – Better channel characteristics enable lower power – Immunity to variation, deterministic and random noise comes at a power cost • On-die calibration and measurement – Calibration can significantly reduce power – Measurement necessary to close the modeling loop 10 System-level link modeling 1. Empirical calculation – Use random data 2. Peak distortion analysis – Analytical calculation of worst-case eye 3. Statistical ISI analysis – Analytical calculation of BER eye 11 Traditional method of signaling analysis and validation • Most chip-to-chip signaling links considered in the past used simple Binary NRZ modulation • These links had a low symbol rate and little channel memory • Transient simulation using a few random data vectors was sufficient to accurately characterize the eye. 12 Motivation for behavioral link analysis • Simulated eye can be optimistic – Won’t capture worst-case ISI, especially for channels with long memory • Characterizes impact of deterministic and random noise sources – For low bit error rates (BER), very unlikely noise conditions must be considered • Nearly exact statistical analysis reduces need for excess design margins • Fast evaluation of various link architectures without designing complete circuits – e.g. Various equalizers can be traded off easily 13 Properties of a Linear Time-invariant System FFT Frequency response (e.g. S-parameters) S21 14 Impulse Response Convolution Superposition Impulse Response LTI property: Convolution Tx symbol (mirror) Impulse response Pulse response 15 LTI property: Superposition In Tx symbol …000010000000… 16 Out Pulse response LTI property: Superposition of symbols In Out Tx symbol … 000010011100 … Response to pattern 100111 17 LTI property: Superposition of coupled symbols In Out Tx symbol …000010000000… 18 FEXT Pulse response LTI property: Superposition of coupled symbols In Out Tx symbol …000011111100… FEXT response 19 LTI property: Superposition of coupled symbols Out FEXT response Tx symbol …000011111100… 20 LTI property: Superposition of coupled symbols Out Insertion loss response Tx symbol …000010011100… 21 LTI property: Superposition of coupled symbols Out FEXT response Tx symbol …000011111100… Insertion loss response Tx symbol …000010011100… 22 Composite response Worst-case eye calculation • Eye diagrams are generally calculated empirically – Convolve random data with pulse response of channel – Pulse response is derived by convolving the impulse reponse with the transmitted symbol • For eye diagrams to represent the worst-case, a large set of random data must be used – Low probability of hitting worst case data transitions – Computationally inefficient • An analytical method of producing the worst-case eye diagram exists – Computationally efficient algorithm 23 Differential S Parameters 24 Eye diagram (100 bits @5Gb/s) 25 Eye diagram (1000 bits @5Gb/s) Random data eye (100 bits) --Random data eye (1000 bits) --- 26 Sample pulse response ISI+ precursor 27 cursor ISI- postcursor Step response V cursor ISI V 0 000011111111111 28 Worst-case 0 0110100100000 VWC0 ISI 29 Worst-case 1 0101100000 VWC1 cursor ISI 30 How to find worst-case patterns Worst-case 0 31 1 1 0 1 0 0 1 Worst-case 1 0 0 1 0 1 1 0 Worst-case Received Voltage Difference (RVD) cursor ISI Worst - case RVD 2 2 16 2 - 3 4 - 1 2 2 2 2 2 32 16 -3 4 -1 2 2 5Gb/s Pulse Response 33 5Gb/s Response due to worst-case data pattern Worst-case 1 Worst-case 0 34 Worst-case data response Lone 1 35 Worst-case 1 5Gb/s WC eye shape Precursor Cursor Postcursor cursor ISI RVD WC 2 2 36 WC eye vs random data eye WC eye shape 100 symbols random data eye 1000 symbols random data eye 37 What is a BER distribution eye? Sample time BER distribution eye Legend Sample reference Sample voltage (V) BER=10-10 Sample time (sec) 38 BER in 10X What is a BER distribution eye? Sample time BER distribution eye Legend Sample reference Sample voltage (V) BER=10-5 Sample time (sec) 39 BER in 10X What is a BER distribution eye? Sample time BER distribution eye Legend Sample voltage (V) BER=10-1 Sample reference Sample time (sec) 40 BER in 10X BER distribution vs Worst-case eye Worst-case eye edges Legend shows BER in 10 41 X BER distribution eye calculation • Calculation method is based on pulse response shape • Assumption: Equal probability of 1 or 0 • Determine probability density function (pdf) of ISI – In contrast to determining peak value of ISI • More computationally intensive than Peak Distortion Analysis 42 BER eye calculation example (no ISI) 0 43 9 0 0 0 0 0 PDF of the cursor (when sending a 1) 9 1 0 PDF of cursor for a 1 44 1 PDF of ISI PDF of a 1 Convolve PDFs PDF of cursor for a 1 9 45 PDF of ISI 0 PDF of a 1 9 Cumulative Distribution Function (CDF) of a 1 PDF of a 1 9 PDF dx. BER( X ) X 9 46 BER distribution eye (when sampling a 1) 9 CDF of a 1 0 9 47 Legend (BER): 1 0 PDF of a 0 Convolve PDFs PDF of cursor for a 0 0 48 PDF of ISI PDF of a 0 0 0 Cumulative Distribution Function (CDF) of a 0 PDF of a 0 0 BER ( X ) PDF dx. X 0 49 BER distribution eye (when sampling a 0) 9 CDF of a 0 0 0 50 Legend (BER): 1 0 BER distribution eye CDF of a 1 p=0.5 9 CDF of a 0 p=0.5 0 CDF of a 1 or 0 0 51 9 Reference BER=0.5 Reference BER=0 Reference BER=0 Reference BER=0.5 9 0 Legend (BER): 0.5 0 BER eye calculation example (w/ ISI) cursor ISI Worst - case RVD 2 2 16 2 - 3 4 - 1 2 2 1 2 2 2 52 16 -3 4 -1 2 2 1st precursor ISI PDF 50% chance of a 1 0.5 50% chance of a 0 53 0 2 PDF of 1st pretcursor ISI 1st postcursor ISI PDF 50% chance of a 1 0.5 50% chance of a 0 54 -3 0 PDF of 1st postcursor ISI 2nd postcursor ISI PDF 50% chance of a 1 0.5 50% chance of a 0 0 4 PDF of 2nd postcursor ISI And so on . . . 55 PDF all ISI Convolve individual PDFs 1st Precursor 1st Postcursor result p=0.25 0 2 -3 0 -3 -1 0 2 2nd Postcursor result p=0.125 -3 -1 0 2 0 4 And so on . . . 56 -3 -1 0 1 2 3 4 6 PDF all ISI p=1/64 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 57 PDF of the cursor (when sending a 1) PDF of cursor for a 1 1 16 58 PDF of a 1 PDF of cursor for a 1 16 59 PDF of ISI -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 PDF of a 1 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Cumulative Distribution Function (CDF) of a 1 PDF of a 1 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 PDF dx. BER( X ) 60 X BER distribution eye (when sampling a 1) CDF of a 1 61 Reference BER=1 Reference BER=0.9 Reference BER=0.3 Reference BER=0 22 16 Legend (BER): 1 0 BER distribution eye (when sampling a 0) CDF of a 0 62 Reference BER=0 Reference BER=0.3 Reference BER=0.9 Reference BER=1 22 16 Legend (BER): 1 0 BER distribution eye CDF of 1 CDF of 0 CDF of a 0 or 1 63 Reference BER=0.5 Reference BER=0.25 BER=0 Reference BER=0.25 Reference BER=0.5 Legend (BER): 0.5 0 Handling Tx jitter in link analysis • Jitter is amplified over lossy channels • Discussion of these methods is beyond the scope of this presentation Following primary authors have published techniques to analyze Tx jitter: Balamurugan, Hanumolu, Sanders, Stojanovic, Casper Tx Rx Lossy XXX Bit-error rate eye with transmit jitter (10 Bit-error Bit-errorrate rateeye eyewith withtransmit transmitjitter jitter(10 (10 ) )) 0.16 0.16 0.16 000 -2-2 -2 0.14 0.14 0.14 -4-4 -4 0.12 0.12 0.12 Sample Sample Voltage Voltage (V) (V) Sample Voltage (V) – Byproduct of frequency-dependent delay and loss – Must be accounted for in analytical model -6-6 -6 -8-8 -8 0.1 0.1 0.1 -10 -10 -1 0.08 0.08 0.08 -12 -12 -1 -14 -14 -1 0.06 0.06 0.06 -16 -16 -1 0.04 0.04 0.04 -18 -18 -1 0.02 0.02 0.02 64 555 10 10 10 15 15 15 20 20 20 25 25 25 30 30 30 Sample Time (psec) Sample SampleTime Time(psec) (psec) 35 35 35 40 40 40 45 45 45 50 50 50 -20 -20 -2 Signaling analysis summary • Link analysis accuracy enables design of balanced link design – Low power – High performance • Three types of link analysis – Empirical: Inexact, optimistic, time consuming – Peak distortion: Uses LTI to find worst-case eye, can be pessimistic – Behavioral/statistical: Exact channel modeling using LTI and behavioral models of circuit blocks • Tx jitter is a special case that must be handled for better behavioral accuracy 65 Design example: 20Gb/s data link “Bonneville” • Goals – Achieve highest performance link using 90nm CMOS • 20Gb/s target across a desktop channel • 10Gb/s target across a server channel – Power < 20mW/Gb/s – Small area (300um by 300um for Rx and Tx) – Forwarded and embedded clock architectures 66 Channel Insertion Loss 0dB -20dB μP/CS -40dB Clean BP -60dB -80dB 0GHz CPU socket 67 5GHz 10GHz 7” FR4 microstrip 15GHz chipset Microprocessor/Chipset: •Non-interleaved routing •FEXT only •Tx Pad cap=0.4pF •Rx Pad cap=0.1pF •microstrip •Sockets on Tx Channel loss and Equalization Channel Response Vs. Frequency 50 – Transmitter pre-emphasis – Receiver linear equalizer – Decision Feedback Equalizer 30 20 Received Magnitude (dB) • Channel loss distorts and attenuates signal • Develop low loss materials • Compensate for channel distortion• Equalization Targeted Filter (Equalizer) Response 40 10 Channel 0 Equalizer Equalized -10 -20 Channel Response -30 -40 -50 0 Equalized Non-Equalized 68 1 2 3 4 5 6 Frequency (GHz) 7 8 9 Equalization overview – Rx DFE • Non-linear – DFE • Linear – Continuous-time • Transversal Filter • High-pass – passive – active » capacitive degeneration » L peaking – Discrete-time • Rx ADC & FIR • Rx analog FIR • Tx pre-emphasis 69 + _ + c4 × × c3 + × c2 × c1 Equalization overview – Tx Preemphasis • Non-linear – DFE Data • Linear – Continuous-time • Transversal Filter • High-pass – passive – active » capacitive degeneration » L peaking – Discrete-time • Rx ADC & FIR • Rx analog FIR • Tx pre-emphasis Δ Δ C-1[5:0] C0[5:0] 6 C1[5:0] 70 DAC Equalization overview – CTLE • Non-linear – DFE • Linear – Continuous-time • Transversal Filter • High-pass – passive – active » capacitive degeneration » L peaking – Discrete-time • Rx ADC & FIR • Rx analog FIR • Tx pre-emphasis 71 bias 1st order CTLE No CTLE μP/CS 30Gb/s 20Gb/s Measured data using similar assumptions 10Gb/s Tx FIR taps DFE taps DFE tap start 72 1 2 3 4 5 6 4 4 1 2 4 4 4 4 4 4 4 4 8 16 32 64 128 8 2 4 4 8 8 3 4 Bonneville architecture 4-tap LE (preemphasis) TX 20Gb/s 2nd-order CTLE RX Phase gen. TX 5GHz clock • Measurement results: – – – – 73 20Gb/s across uP channel 15Gb/s across server channel 12mW/Gb/s power efficiency Measured data rate matched link modeling results within 10% Data link measurements • Some data link blocks are straightforward to characterize with external measurement equipment – Examples: • Data Tx (50Ω) • DC currents and voltages (averaging) • Recovered data (after sampling), e.g. Bit error rate tester (BERT) • Other measurements are extremely difficult to perform with external measurement equipment – Examples: • Clock jitter (>5GHz), especially high-frequency jitter • Sampled data eye • Data Rx sensitivity • Built-in self test (BIST) and self-calibration is required for high-volume testing of data links – Examples: • Automatic clock-data deskew • Adaptive equalization • On-die measurement capability nearly essential in multi-Gb/s data links – Closes the loop for link design – Enables BIST and calibration 74 Bonneville on-die measurement 4-tap LE (preemphasis) TX 20Gb/s 2nd-order CTLE RX Phase gen. TX 75 5GHz clock Bonneville on-die measurement Offset 4-tap LE (preemphasis) TX 20Gb/s 2nd-order CTLE RX Phase gen. TX 76 5GHz clock Error counter Test Logic On-die scope test capabilities Characterize circuits 3 modes 77 On-die scope test capabilities Characterize circuits 3 modes 78 Waveform capture On-die scope test capabilities Characterize circuits 3 modes Waveform capture BER eye diagrams 79 RX Input-referred Noise + Vtest (DC) - RX PDF Counter Test Offset ctrl Control • Measurement: – Sweep calibrated digital offset to generate CDF, counting 1’s and 0’s – Generate noise CDF/PDF for Rx 80 + Vtest (DC) - RX PDF Counter Test Offset ctrl Control Offset [V] RX Input-referred Noise Vtest (DC) + noise Offset [V] All 0’s All 1’s 81 + Vtest (DC) - RX PDF Counter Test Offset ctrl Control Offset [V] RX Input-referred Noise Vtest (DC) + noise Prob{‘1’} (CDF) Offset [V] 82 Offset [V] All 0’s All 1’s + Vtest (DC) - RX PDF Counter Test Offset ctrl Control Offset [V] RX Input-referred Noise Vtest (DC) + noise IR Noise PDF Offset [V] 83 Offset [V] All 0’s All 1’s Rx input noise (no PSN, no offset) 10 Noise PDF 10 10 10 10 10 84 0 -2 -4 -6 -8 -10 -15 -10 -5 noise 1.3mV Det. noise 0mVp-p 0 5 10 Offset voltage [mV] 15 20 25 Rx input noise (200MHz PSN, no offset) 10 Noise PDF 10 10 10 10 10 85 0 -2 -4 -6 -8 -10 -15 -10 -5 noise 1.1mV Det. noise ~1mVp-p 0 5 10 Offset voltage [mV] 15 20 25 Rx input noise (200MHz PSN, 85mV offset) 10 Noise PDF 10 10 10 10 10 86 0 -2 -4 -6 -8 -10 65 70 75 80 noise 1mV Det. noise 16mVp-p 85 90 95 Offset voltage [mV] 100 105 110 Rx PSRR (200MHz) Noise floor 87 On-die waveform capture • Sample periodic signal: – Voltage: Eqivalenttime A/D using comparator offset 88 On-die waveform capture • Sample periodic signal: – Voltage: Eqivalenttime A/D using comparator offset – Time: Equivalient time A/D using interpolator offset 89 On-die waveform capture • Sample periodic signal: – Voltage: Eqivalenttime A/D using comparator offset – Time: Equivalient time A/D using interpolator offset 90 On-die waveform capture • Sample periodic signal: – Voltage: Eqivalenttime A/D using comparator offset – Time: Equivalient time A/D using interpolator offset 91 Wave capture, Rx eq 0.25 0.2 0.15 Voltage (V) 0.1 0.05 0 -0.05 -0.1 -0.15 -0.2 Tx -0.25 0.2 92 0.4 0.6 0.8 1 Time (nsec) Rx Eq 1.2 1.4 Rx 1.6 1.8 Wave capture, Tx eq 0.25 0.2 0.15 Voltage (V) 0.1 0.05 0 -0.05 -0.1 -0.15 -0.2 Tx Eq -0.25 0.2 93 0.4 0.6 Tx 0.8 1 Time (nsec) Rx 1.2 1.4 1.6 1.8 Wave capture, Tx+Rx eq 0.25 0.2 0.15 Voltage (V) 0.1 0.05 0 -0.05 -0.1 -0.15 -0.2 Tx Eq -0.25 0.2 94 0.4 0.6 Tx 0.8 1 Time (nsec) Rx Eq 1.2 1.4 Rx 1.6 1.8 BER eye diagram • Characterize BER at various sampling points: – Voltage: Vary comparator offset – Time: Vary interpolator offset 95 Fail Pass BER eye diagram • Characterize BER at various sampling points: – Voltage: Vary comparator offset – Time: Vary interpolator offset 96 Fail Pass BER eye diagram • Characterize BER at various sampling points: – Voltage: Vary comparator offset – Time: Vary interpolator offset 97 # errors Rx Equalization (CTLE) -0.06 -1 -2 -0.04 -3 Voltage [V] -0.02 -4 -5 0 -6 0.02 -7 -8 0.04 0.06 -0.5 98 Datarate 17.5Gb/s Channel 7” Desktop Tx 0 Time [UI] Rx Eq Rx -9 0.5 -10 Tx Equalization (Pre-emphasis) -0.06 -1 -2 -0.04 -3 Voltage [V] -0.02 -4 -5 0 -6 0.02 -7 -8 0.04 0.06 -0.5 99 Datarate 17.5Gb/s Channel 7” Desktop Tx Eq 0 Time [UI] Tx Rx -9 0.5 -10 Tx + Rx Equalization -0.06 -1 -2 -0.04 -3 Voltage [V] -0.02 -4 -5 0 -6 0.02 -7 -8 0.04 0.06 -0.5 100 Datarate 17.5Gb/s Channel 7” Desktop Tx Eq 0 Time [UI] Tx Rx Eq Rx -9 0.5 -10 Tx + Rx Equalization, no Rx offset trim -0.06 -1 -2 -0.04 -3 Voltage [V] -0.02 -4 -5 0 -6 0.02 -7 -8 0.04 0.06 -0.5 101 Datarate 17.5Gb/s Channel 7” Desktop Tx Eq 0 Time [UI] Tx Rx Eq Rx -9 0.5 -10 Tx + Rx Eq, 10% Tx PSN @ 200MHz -0.06 -1 -2 -0.04 -3 Voltage [V] -0.02 -4 -5 0 -6 0.02 -7 -8 0.04 0.06 -0.5 102 Datarate 17.5Gb/s Channel 7” Desktop Tx Eq 0 Time [UI] Tx Rx Eq Rx -9 0.5 -10 Tx + Rx Eq, 10% Rx PSN @ 200MHz -0.06 -1 -2 -0.04 -3 Voltage [V] -0.02 -4 -5 0 -6 0.02 -7 -8 0.04 0.06 -0.5 103 Datarate 17.5Gb/s Channel 7” Desktop Tx Eq 0 Time [UI] Tx Rx Eq Rx -9 0.5 -10 Measurement summary • On-die link measurements close the design loop and enable link self test and adaptation – Example: BER eye • On-die measurements can add significantly less noise than offdie measurements – Example: Clock-data jitter measurement • However, calibration of the on-die circuits is still required for absolute accuracy – Examples: Voltage offsets, phase interpolators • In some cases, such as were averaging is possible, off-die measurements are still very useful. 104 Overall summary • Tera-scale many-core μP’s will drive aggregate I/O rates aggressively – Power budget will constrain link design space • Power efficiency depends strongly on process technology and channel • High-performance and low-power link design requires accurate system level tools – Tools are in place with areas for improvement • On-die link measurement capabilities close design loop and enable link self-test and adaptation Acknowledgements: Ganesh Balamurugan, James Jaussi, Joe Kennedy, Mozhgan Mansuri, Randy Mooney, Shekhar Borkar 105