Download DTTC 2006 Presentation Template

Document related concepts
no text concepts found
Transcript
Advanced Analysis, Design, and
Measurement Techniques for
Multi-Gb/s Data Links
Frank O’Mahony (frank.o’[email protected])
Bryan Casper
Circuit Research Lab, Intel
Hillsboro, OR
Outline
• Overview of I/O trends
• System-level link modeling
– Worst-case data eye
– Statistical data eye
– Design example: 20Gb/s link
• On-die measurement techniques
2
Chip-to-Chip Signaling Trends
Decade
Speeds
Transceiver Features
1980’s
>10Mb/s
Inverter out, inverter in
1990’s
>100Mb/s
Termination
Source-synchronous clk.
Lossy transmission line
2000’s
>1 Gb/s
Pt-to-pt serial streams
Pre-emphasis equalization
noise
Future
>10 Gb/s
Adaptive Equalization,
Advanced low power clk.
Alternate channel materials
Lumped capacitance
Transmission line
…
Channel
Transmit
Filter
3
h(t)
Sampler
S
Linear
Equalize
r
CDR
Slicer
CMOS transceiver data rates
CMOS Transceiver Data Rates
100
• Plot showing
link rate vs. year
Data Rate [Gb/s]
Power/channel
limitations
10
1
0.1
0.01
0.1
1
Technology [um]
Courtesy of Prof. Ken Yang, UCLA
4
10
Power Density Increases Exponentially!
Rocket Nozzle
Power density [Watts/cm2]
1000
Nuclear reactor
Max power
density envelope
Pentium® 4
100
Pentium® III
Pentium® II
Hot plate
10
Pentium® Pro
Pentium®
i386
i486
1
1.5
1
0.7
0.5
0.35
0.25
0.18
0.13
Process Technology node [μm]
5
0.1
0.07
6
Teraflops Research Chip
100 Million Transistors ● 80 Tiles ● 275mm2
• First tera-scale programmable
silicon:
–
–
–
–
Teraflops performance
Tile design approach
On-die mesh network
Power-aware capability
• Tera-scale many-core μP’s will
drive aggregate I/O rates
aggressively
7
Power efficiency and process technology
• Process scaling
enables lower
power data links
1000
Erg/bit [pJ]
100
• Channel
characteristics
can limit
achievable power
efficiency
y = 399.17x1.157
10
Erg/bit (2-PAM)
Erg/bit Trend (2PAM)
1
0.01
0.1
Process technology [um]
Courtesy of Prof. Ken Yang, UCLA
8
1
I/O Data Rate and Power Efficiency
Power Efficiency (mW/Gb/s)
60
40
20
7.5
J. Wong
VLSI’03
R. Palmer
ISSCC’07
Prete
ISSCC’07
5
10
Data Rate (Gb/s)
9
BNV
ISSCC’06
2.2
0
0
11.7
9.6
15
20
Designing power-efficient multi-Gb/s links
• Accurate system-level link modeling
– Careful statistical accounting of all noises
• ISI, Xtalk, voltage, and timing noise
• Power-efficient I/O system implementation
– Design within the BW of the process technology
– Better channel characteristics enable lower power
– Immunity to variation, deterministic and random noise comes at a power
cost
• On-die calibration and measurement
– Calibration can significantly reduce power
– Measurement necessary to close the modeling loop
10
System-level link modeling
1. Empirical calculation
– Use random data
2. Peak distortion analysis
– Analytical calculation of worst-case eye
3. Statistical ISI analysis
– Analytical calculation of BER eye
11
Traditional method of signaling analysis
and validation
• Most chip-to-chip signaling links considered in the past used simple Binary
NRZ modulation
• These links had a low symbol rate and little channel memory
• Transient simulation using a few random data vectors was sufficient to
accurately characterize the eye.
12
Motivation for behavioral link analysis
• Simulated eye can be optimistic
– Won’t capture worst-case ISI, especially for channels with long memory
• Characterizes impact of deterministic and random noise
sources
– For low bit error rates (BER), very unlikely noise conditions must be
considered
• Nearly exact statistical analysis reduces need for excess
design margins
• Fast evaluation of various link architectures without designing
complete circuits
– e.g. Various equalizers can be traded off easily
13
Properties of a Linear Time-invariant System
FFT
Frequency response
(e.g. S-parameters)
S21
14
Impulse Response
Convolution
Superposition
Impulse Response
LTI property: Convolution
Tx symbol (mirror)

Impulse response

Pulse response
15
LTI property: Superposition
In
Tx symbol
…000010000000…
16
Out
Pulse response
LTI property: Superposition of symbols
In
Out
Tx symbol
… 000010011100 …
Response to pattern
100111
17
LTI property: Superposition of coupled symbols
In
Out
Tx symbol
…000010000000…
18
FEXT Pulse
response
LTI property: Superposition of coupled symbols
In
Out
Tx symbol
…000011111100…
FEXT response
19
LTI property: Superposition of coupled symbols
Out
FEXT response
Tx symbol
…000011111100…
20
LTI property: Superposition of coupled symbols
Out
Insertion loss response
Tx symbol
…000010011100…
21
LTI property: Superposition of coupled symbols
Out
FEXT response
Tx symbol
…000011111100…
Insertion loss response
Tx symbol
…000010011100…
22
Composite response
Worst-case eye calculation
• Eye diagrams are generally calculated empirically
– Convolve random data with pulse response of channel
– Pulse response is derived by convolving the impulse reponse with the
transmitted symbol
• For eye diagrams to represent the worst-case, a large set of random
data must be used
– Low probability of hitting worst case data transitions
– Computationally inefficient
• An analytical method of producing the worst-case eye diagram exists
– Computationally efficient algorithm
23
Differential S Parameters
24
Eye diagram (100 bits @5Gb/s)
25
Eye diagram (1000 bits @5Gb/s)
Random data eye (100 bits) --Random data eye (1000 bits) ---
26
Sample pulse response
ISI+
precursor
27
cursor
ISI-
postcursor
Step response
V  cursor   ISI
V  0
000011111111111
28
Worst-case 0
0110100100000
VWC0   ISI 
29
Worst-case 1
0101100000
VWC1  cursor   ISI 
30
How to find worst-case patterns
Worst-case 0 
31
1
1
0
1
0
0
1
Worst-case 1  0
0
1
0
1
1
0
Worst-case Received Voltage Difference (RVD)
cursor  ISI
Worst - case RVD 

2
2
16 2  - 3  4  - 1  2
 
2
2
2
2
32
16 -3
4
-1
2
2
5Gb/s Pulse Response
33
5Gb/s Response due to worst-case data pattern
Worst-case 1
Worst-case 0
34
Worst-case data response
Lone 1
35
Worst-case 1
5Gb/s WC eye shape
Precursor Cursor Postcursor
cursor  ISI
RVD WC 

2
2
36
WC eye vs random data eye
WC eye shape
100 symbols random data eye
1000 symbols random data eye
37
What is a BER distribution eye?
Sample time
BER distribution eye
Legend
Sample
reference
Sample voltage (V)
BER=10-10
Sample time (sec)
38
BER in 10X
What is a BER distribution eye?
Sample time
BER distribution eye
Legend
Sample
reference
Sample voltage (V)
BER=10-5
Sample time (sec)
39
BER in 10X
What is a BER distribution eye?
Sample time
BER distribution eye
Legend
Sample voltage (V)
BER=10-1
Sample
reference
Sample time (sec)
40
BER in 10X
BER distribution vs Worst-case eye
Worst-case eye edges
Legend shows BER in 10
41
X
BER distribution eye calculation
• Calculation method is based on pulse response shape
• Assumption: Equal probability of 1 or 0
• Determine probability density function (pdf) of ISI
– In contrast to determining peak value of ISI
• More computationally intensive than Peak Distortion
Analysis
42
BER eye calculation example (no ISI)
0
43
9
0
0
0
0
0
PDF of the cursor (when sending a 1)
9
1
0
PDF of cursor for a 1
44
1
PDF of ISI
PDF of a 1
Convolve PDFs
PDF of cursor for a 1
9
45
PDF of ISI
0
PDF of a 1
9
Cumulative Distribution Function (CDF) of a 1
PDF of a 1
9

PDF dx.

BER( X )  
X
9
46
BER distribution eye (when sampling a 1)
9
CDF of a 1
0
9
47
Legend
(BER):
1
0
PDF of a 0
Convolve PDFs
PDF of cursor for a 0
0
48
PDF of ISI
PDF of a 0
0
0
Cumulative Distribution Function (CDF) of a 0
PDF of a 0
0

BER ( X )   PDF dx.
X
0
49
BER distribution eye (when sampling a 0)
9
CDF of a 0
0
0
50
Legend
(BER):
1
0
BER distribution eye
CDF of a 1
p=0.5
9
CDF of a 0
p=0.5
0
CDF of a 1 or 0
0
51
9
Reference
BER=0.5
Reference
BER=0
Reference
BER=0
Reference
BER=0.5
9
0
Legend
(BER):
0.5
0
BER eye calculation example (w/ ISI)
cursor  ISI
Worst - case RVD 

2
2
16 2  - 3  4  - 1  2  2
 
1
2
2
2
52
16 -3
4
-1
2
2
1st precursor ISI PDF
50% chance of a 1
0.5
50% chance of a 0
53
0 2
PDF of 1st pretcursor ISI
1st postcursor ISI PDF
50% chance of a 1
0.5
50% chance of a 0
54
-3 0
PDF of 1st postcursor ISI
2nd postcursor ISI PDF
50% chance of a 1
0.5
50% chance of a 0
0
4
PDF of 2nd postcursor ISI
And so on . . .
55
PDF all ISI
Convolve individual PDFs
1st Precursor
1st Postcursor
result
p=0.25
0 2
-3 0
-3 -1 0 2
2nd Postcursor
result
p=0.125
-3 -1 0 2
0
4
And so on . . .
56
-3 -1 0 1 2 3 4 6
PDF all ISI
p=1/64
-4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10
57
PDF of the cursor (when sending a 1)
PDF of cursor for a 1
1
16
58
PDF of a 1
PDF of cursor for a 1
16
59
PDF of ISI
-4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10
PDF of a 1
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Cumulative Distribution Function (CDF) of a 1
PDF of a 1
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

PDF dx.

BER( X )  
60
X
BER distribution eye (when sampling a 1)
CDF of a 1
61
Reference
BER=1
Reference
BER=0.9
Reference
BER=0.3
Reference
BER=0
22
16
Legend
(BER):
1
0
BER distribution eye (when sampling a 0)
CDF of a 0
62
Reference
BER=0
Reference
BER=0.3
Reference
BER=0.9
Reference
BER=1
22
16
Legend
(BER):
1
0
BER distribution eye
CDF of 1
CDF of 0
CDF of a 0 or 1
63
Reference
BER=0.5
Reference
BER=0.25
BER=0
Reference
BER=0.25
Reference
BER=0.5
Legend
(BER):
0.5
0
Handling Tx jitter in link analysis
• Jitter is amplified over lossy
channels
• Discussion of these methods is
beyond the scope of this
presentation
Following primary authors have published
techniques to analyze Tx jitter:
 Balamurugan, Hanumolu, Sanders,
Stojanovic, Casper
Tx
Rx
Lossy
XXX
Bit-error
rate
eye
with
transmit
jitter
(10
Bit-error
Bit-errorrate
rateeye
eyewith
withtransmit
transmitjitter
jitter(10
(10
) ))
0.16
0.16
0.16
000
-2-2
-2
0.14
0.14
0.14
-4-4
-4
0.12
0.12
0.12
Sample
Sample Voltage
Voltage (V)
(V)
Sample
Voltage
(V)
– Byproduct of frequency-dependent
delay and loss
– Must be accounted for in analytical
model
-6-6
-6
-8-8
-8
0.1
0.1
0.1
-10
-10
-1
0.08
0.08
0.08
-12
-12
-1
-14
-14
-1
0.06
0.06
0.06
-16
-16
-1
0.04
0.04
0.04
-18
-18
-1
0.02
0.02
0.02
64
555
10
10
10
15
15
15
20
20
20
25
25
25
30
30
30
Sample
Time
(psec)
Sample
SampleTime
Time(psec)
(psec)
35
35
35
40
40
40
45
45
45
50
50
50
-20
-20
-2
Signaling analysis summary
• Link analysis accuracy enables design of balanced link
design
– Low power
– High performance
• Three types of link analysis
– Empirical: Inexact, optimistic, time consuming
– Peak distortion: Uses LTI to find worst-case eye, can be
pessimistic
– Behavioral/statistical: Exact channel modeling using LTI and
behavioral models of circuit blocks
• Tx jitter is a special case that must be handled for
better behavioral accuracy
65
Design example: 20Gb/s data link “Bonneville”
• Goals
– Achieve highest performance link using 90nm CMOS
• 20Gb/s target across a desktop channel
• 10Gb/s target across a server channel
– Power < 20mW/Gb/s
– Small area (300um by 300um for Rx and Tx)
– Forwarded and embedded clock architectures
66
Channel Insertion Loss
0dB
-20dB
μP/CS
-40dB
Clean BP
-60dB
-80dB
0GHz
CPU socket
67
5GHz
10GHz
7” FR4 microstrip
15GHz
chipset
Microprocessor/Chipset:
•Non-interleaved routing
•FEXT only
•Tx Pad cap=0.4pF
•Rx Pad cap=0.1pF
•microstrip
•Sockets on Tx
Channel loss and Equalization
Channel Response Vs. Frequency
50
– Transmitter pre-emphasis
– Receiver linear equalizer
– Decision Feedback Equalizer
30
20
Received Magnitude (dB)
• Channel loss distorts and
attenuates signal
• Develop low loss materials
• Compensate for channel
distortion• Equalization
Targeted Filter (Equalizer) Response
40
10
Channel
0
Equalizer
Equalized
-10
-20
Channel Response
-30
-40
-50
0
Equalized
Non-Equalized
68
1
2
3
4
5
6
Frequency (GHz)
7
8
9
Equalization overview – Rx DFE
• Non-linear
– DFE
• Linear
– Continuous-time
• Transversal Filter
• High-pass
– passive
– active
» capacitive
degeneration
» L peaking
– Discrete-time
• Rx ADC & FIR
• Rx analog FIR
• Tx pre-emphasis
69
+
_
+

c4
×
×

c3
+
×

c2
×
c1
Equalization overview – Tx Preemphasis
• Non-linear
– DFE
Data
• Linear
– Continuous-time
• Transversal Filter
• High-pass
– passive
– active
» capacitive
degeneration
» L peaking
– Discrete-time
• Rx ADC & FIR
• Rx analog FIR
• Tx pre-emphasis
Δ
Δ
C-1[5:0]
C0[5:0]
6
C1[5:0]
70
DAC
Equalization overview – CTLE
• Non-linear
– DFE
• Linear
– Continuous-time
• Transversal Filter
• High-pass
– passive
– active
» capacitive
degeneration
» L peaking
– Discrete-time
• Rx ADC & FIR
• Rx analog FIR
• Tx pre-emphasis
71
bias
1st order CTLE
No CTLE
μP/CS
30Gb/s
20Gb/s
Measured data
using similar
assumptions
10Gb/s
Tx FIR taps
DFE taps
DFE tap start
72
1
2
3
4 5
6
4 4
1 2
4 4 4 4 4 4 4
4 8 16 32 64 128 8
2
4
4
8
8
3
4
Bonneville architecture
4-tap
LE
(preemphasis)
TX
20Gb/s
2nd-order
CTLE
RX
Phase gen.
TX
5GHz clock
• Measurement results:
–
–
–
–
73
20Gb/s across uP channel
15Gb/s across server channel
12mW/Gb/s power efficiency
Measured data rate matched link modeling results within 10%
Data link measurements
• Some data link blocks are straightforward to characterize with external
measurement equipment
– Examples:
• Data Tx (50Ω)
• DC currents and voltages (averaging)
• Recovered data (after sampling), e.g. Bit error rate tester (BERT)
• Other measurements are extremely difficult to perform with external
measurement equipment
– Examples:
• Clock jitter (>5GHz), especially high-frequency jitter
• Sampled data eye
• Data Rx sensitivity
• Built-in self test (BIST) and self-calibration is required for high-volume testing
of data links
– Examples:
• Automatic clock-data deskew
• Adaptive equalization
• On-die measurement capability nearly essential in multi-Gb/s data links
– Closes the loop for link design
– Enables BIST and calibration
74
Bonneville on-die measurement
4-tap
LE
(preemphasis)
TX
20Gb/s
2nd-order
CTLE
RX
Phase gen.
TX
75
5GHz clock
Bonneville on-die measurement
Offset
4-tap
LE
(preemphasis)
TX
20Gb/s
2nd-order
CTLE
RX
Phase gen.
TX
76
5GHz clock
Error
counter
Test
Logic
On-die scope test capabilities
Characterize
circuits
3
modes
77
On-die scope test capabilities
Characterize
circuits
3
modes
78
Waveform
capture
On-die scope test capabilities
Characterize
circuits
3
modes
Waveform
capture
BER eye
diagrams
79
RX Input-referred Noise
+
Vtest (DC)
-
RX
PDF
Counter
Test
Offset ctrl Control
•
Measurement:
– Sweep calibrated digital offset to
generate CDF, counting 1’s and
0’s
– Generate noise CDF/PDF for Rx
80
+
Vtest (DC)
-
RX
PDF
Counter
Test
Offset ctrl Control
Offset [V]
RX Input-referred Noise
Vtest (DC) + noise
Offset [V]
All 0’s
All 1’s
81
+
Vtest (DC)
-
RX
PDF
Counter
Test
Offset ctrl Control
Offset [V]
RX Input-referred Noise
Vtest (DC) + noise
Prob{‘1’} (CDF)
Offset [V]
82
Offset [V]
All 0’s
All 1’s
+
Vtest (DC)
-
RX
PDF
Counter
Test
Offset ctrl Control
Offset [V]
RX Input-referred Noise
Vtest (DC) + noise
IR Noise PDF
Offset [V]
83
Offset [V]
All 0’s
All 1’s
Rx input noise (no PSN, no offset)
10
Noise PDF
10
10
10
10
10
84
0
-2
-4
-6
-8
-10
-15
-10
-5
noise
1.3mV
Det.
noise
0mVp-p
0
5
10
Offset voltage [mV]
15
20
25
Rx input noise (200MHz PSN, no offset)
10
Noise PDF
10
10
10
10
10
85
0
-2
-4
-6
-8
-10
-15
-10
-5
noise
1.1mV
Det.
noise
~1mVp-p
0
5
10
Offset voltage [mV]
15
20
25
Rx input noise (200MHz PSN, 85mV offset)
10
Noise PDF
10
10
10
10
10
86
0
-2
-4
-6
-8
-10
65
70
75
80
noise
1mV
Det.
noise
16mVp-p
85
90
95
Offset voltage [mV]
100
105
110
Rx PSRR (200MHz)
Noise floor
87
On-die waveform capture
• Sample periodic
signal:
– Voltage: Eqivalenttime A/D using
comparator offset
88
On-die waveform capture
• Sample periodic
signal:
– Voltage: Eqivalenttime A/D using
comparator offset
– Time: Equivalient
time A/D using
interpolator offset
89
On-die waveform capture
• Sample periodic
signal:
– Voltage: Eqivalenttime A/D using
comparator offset
– Time: Equivalient
time A/D using
interpolator offset
90
On-die waveform capture
• Sample periodic
signal:
– Voltage: Eqivalenttime A/D using
comparator offset
– Time: Equivalient
time A/D using
interpolator offset
91
Wave capture, Rx eq
0.25
0.2
0.15
Voltage (V)
0.1
0.05
0
-0.05
-0.1
-0.15
-0.2
Tx
-0.25
0.2
92
0.4
0.6
0.8
1
Time (nsec)
Rx Eq
1.2
1.4
Rx
1.6
1.8
Wave capture, Tx eq
0.25
0.2
0.15
Voltage (V)
0.1
0.05
0
-0.05
-0.1
-0.15
-0.2
Tx Eq
-0.25
0.2
93
0.4
0.6
Tx
0.8
1
Time (nsec)
Rx
1.2
1.4
1.6
1.8
Wave capture, Tx+Rx eq
0.25
0.2
0.15
Voltage (V)
0.1
0.05
0
-0.05
-0.1
-0.15
-0.2
Tx Eq
-0.25
0.2
94
0.4
0.6
Tx
0.8
1
Time (nsec)
Rx Eq
1.2
1.4
Rx
1.6
1.8
BER eye diagram
• Characterize BER at
various sampling
points:
– Voltage: Vary
comparator offset
– Time: Vary interpolator
offset
95
Fail
Pass
BER eye diagram
• Characterize BER at
various sampling
points:
– Voltage: Vary
comparator offset
– Time: Vary interpolator
offset
96
Fail
Pass
BER eye diagram
• Characterize BER at
various sampling
points:
– Voltage: Vary
comparator offset
– Time: Vary interpolator
offset
97
# errors
Rx Equalization (CTLE)
-0.06
-1
-2
-0.04
-3
Voltage [V]
-0.02
-4
-5
0
-6
0.02
-7
-8
0.04
0.06
-0.5
98
Datarate
17.5Gb/s
Channel
7” Desktop
Tx
0
Time [UI]
Rx Eq
Rx
-9
0.5
-10
Tx Equalization (Pre-emphasis)
-0.06
-1
-2
-0.04
-3
Voltage [V]
-0.02
-4
-5
0
-6
0.02
-7
-8
0.04
0.06
-0.5
99
Datarate
17.5Gb/s
Channel
7” Desktop
Tx Eq
0
Time [UI]
Tx
Rx
-9
0.5
-10
Tx + Rx Equalization
-0.06
-1
-2
-0.04
-3
Voltage [V]
-0.02
-4
-5
0
-6
0.02
-7
-8
0.04
0.06
-0.5
100
Datarate
17.5Gb/s
Channel
7” Desktop
Tx Eq
0
Time [UI]
Tx
Rx Eq
Rx
-9
0.5
-10
Tx + Rx Equalization, no Rx offset trim
-0.06
-1
-2
-0.04
-3
Voltage [V]
-0.02
-4
-5
0
-6
0.02
-7
-8
0.04
0.06
-0.5
101
Datarate
17.5Gb/s
Channel
7” Desktop
Tx Eq
0
Time [UI]
Tx
Rx Eq
Rx
-9
0.5
-10
Tx + Rx Eq, 10% Tx PSN @ 200MHz
-0.06
-1
-2
-0.04
-3
Voltage [V]
-0.02
-4
-5
0
-6
0.02
-7
-8
0.04
0.06
-0.5
102
Datarate
17.5Gb/s
Channel
7” Desktop
Tx Eq
0
Time [UI]
Tx
Rx Eq
Rx
-9
0.5
-10
Tx + Rx Eq, 10% Rx PSN @ 200MHz
-0.06
-1
-2
-0.04
-3
Voltage [V]
-0.02
-4
-5
0
-6
0.02
-7
-8
0.04
0.06
-0.5
103
Datarate
17.5Gb/s
Channel
7” Desktop
Tx Eq
0
Time [UI]
Tx
Rx Eq
Rx
-9
0.5
-10
Measurement summary
• On-die link measurements close the design loop and enable link
self test and adaptation
– Example: BER eye
• On-die measurements can add significantly less noise than offdie measurements
– Example: Clock-data jitter measurement
• However, calibration of the on-die circuits is still required for
absolute accuracy
– Examples: Voltage offsets, phase interpolators
• In some cases, such as were averaging is possible, off-die
measurements are still very useful.
104
Overall summary
• Tera-scale many-core μP’s will drive aggregate I/O
rates aggressively
– Power budget will constrain link design space
• Power efficiency depends strongly on process
technology and channel
• High-performance and low-power link design requires
accurate system level tools
– Tools are in place with areas for improvement
• On-die link measurement capabilities close design loop
and enable link self-test and adaptation
Acknowledgements: Ganesh Balamurugan,
James Jaussi, Joe Kennedy, Mozhgan Mansuri,
Randy Mooney, Shekhar Borkar
105
Related documents