Download 2.1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Explicit Computation of Performance as a
Function of Process Parameters
Lou Scheffer
1
Tau 2002
What’s the problem?
• Chip manufacturing not perfect, so….
• Each chip is different
• Designers want as many chips as possible to work
• We consider 3 kinds of variation
- Inter-chip
- Intra-chip
‣ Deterministic
‣ Statistical
2
Tau 2002
Intra-chip Deterministic Variation (not
considered further in this presentation)
• Optical Proximity Effects
• Metal Density Effects
• Center vs corner focus
You draw this:
3
You get this:
Tau 2002
Inter-chip variation
• Many of the sources of variation affect all objects
on the same layer of the same chip.
• Examples:
- Metal or dielectric layers might be thicker/thinner
- Each exposure could be over/under exposed
- Each layer could be over/under etched
4
Tau 2002
Interconnect variation
• Looking at chip cross section
• Pitch is well controlled, so spacing is not
independent
P5
P4 Pitch is well controlled
These dimensions
can vary
indpendently
P2
Width and
spacing are not
independent
P3
P1
P0
5
Tau 2002
Intra-chip statistical variation
• Even within a single chip, not all parameters track:
- Gradients
- Non-flat wafers
- Statistical variation
‣ Particularly apparent for transistors and therefore gates
‣ Small devices increase the role of variation in number of
dopant atoms and DL
• Analog designers have coped with this for years
• Mismatch is statistical and a function of distance
between two figures.
6
Tau 2002
Previous Approaches
• Worst case corners – all parameters set to 3s
- Does not handle intra-chip variation at all
• 6 corner analysis
- Classify each signal/gate as clock or data
- Cases: both clock and data maximally slow, clock
maximally slow and data almost as slow, etc.
• Problems with these approaches
- Too pessimistic: very unlikely to get 3s on all parameters
- Not pessimistic enough: doesn’t handle fast M1, slow M2
7
Tau 2002
Parasitics, net delays, path delays are f(P)
• CNET= f(P0, P1, P2, …)
• DELAYNET= f(P0, P1, P2, …)
P5
P4 Pitch is well controlled
P2
P3
P1
P0
8
Tau 2002
Keeping derivatives
• We represent a value as a Taylor series
N
D  D 0   d 0  Pi
1
• Where the di describe how the value varies with
a change in process parameter Dpi
• Where Dpi itself has 2 parts Dpi = DGi + Dsi,d
- DGi is global (chip-wide variation)
- Dsi,d is the statistical variation of this value
9
Tau 2002
Design constraints map to fuzzy hyperplanes
• The difference between data and clock must be less
than the cycle
time:


D( P)  C ( P)  TMAX
• Which defines a fuzzy hyperplane in process space
ANOM  CNOM   (ai  ci )DGi   (ai sia  ci sic )  TMAX
i
i
Global
(Hyperplane)
10
Tau 2002
Statistical
(sums to distribution)
Comparison to purely statistical timing
• Two approaches are complementary
Explicit computation
Propagate functions
Statistical timing
Propagate distributions
11
Tau 2002
Similarities in timing analysis
• Extraction and delay reduction are
straightforward, timing is not
• Latest arriving signal is now poorly defined
• If a significant probability for more than one
signal to be last, both must be kept (or some
approximate bound applied).
• Pruning threshold will determine accuracy/size
tradeoff.
• Must compute an estimate of parametric yield at
the end.
• Provide a probability of failure per path for
12
optimization.
Tau 2002
Differences
• Propagate functions instead of distributions
• Distributions of process parameters are used at
different times
- Statistical timing needs process parameters to do
timing analysis
- Explicit computation does timing analysis first, then
plugs in process distributions to get timing
distributions.
‣ Can evaluate different distributions without re-doing timing
analysis
13
Tau 2002
Pruning
• In statistical timing
- Prune if one signal is ‘almost always’ earlier
- Need to consider correlation because of shared input
cones
•
- Result is a distribution of delays
In this explicit computation of timing
- Prune if one is earlier under ‘almost all’ process
conditions
- Result is a function of process parameters
- Bad news – an exact answer could require
(exponentially) complex functions
- Good news - no problem with correlation
14
Tau 2002
The bad news – complicated functions
0.8-0.2*P1+1.0*P2
0.7+0.5*P1
2
A
B
• Shows a possible pruning
problem for a 2 input gate
1.5
1
0.5
20
19
18
17
16
15
14
13
12
11
10987
6543
210
15
a
01234
0
2
0
141516171819
5 6 7 8 9 101 1213
Tau 2002
• Bottom axes are two
process parameters; vertical
is MAX(A,B)
• Can keep it as an explicit
function and prune when it gets
too expensive
•Can cover with one
(conservative) plane
The good news - reconvergent fanout
16
•
The classic re-convergent fanout problem
•
To avoid this, statistical timing needs to keep careful track
of common paths – can take exponential time
Tau 2002
Reconvergent fanout (continued)
• Explicit calculation gives the correct result
without common path calculations
P1 =
Plug in distribution
for P1
D0+DP1
D1+DP1
D1+DP1
17
Tau 2002
D2+DP1
Real situation is a combination of both
• Gate delays are somewhat correlated but have a
big statistical component
• Wire delays (particularly close wires) are very
highly correlated but have a small random
component.
• Delays consist of two parts that combine differently
Distribution of statistical part is also
a function of process variation
18
Tau 2002
So what’s the point of explicit computation?
• Not so worst case timing predictions
- Users have complained for years about timing pessimism
- Could be about 10% better (see experimental results)
- Could save months by eliminating unneeded tuning
• Will catch errors that are currently missed
- Fast/slow combinations are not currently verified
• Can predict parametric yield
- What’s the timing yield?
- How much will it help to get rid of a non-critical path?
19
Tau 2002
Predicted variations are always smaller
• Let C = C0 + k0Dp0+ k1Dp1 , , where Dp0 has
deviation s1 and Dp1 has deviation s2 .
• Then worst corner case is: C0  3k0s 0  3k1s1
• But if Dp0 and Dp1 are independent, we have
s  (k0s 0 ) 2  (k1s 1 ) 2
• So the real 3-sigma worst case is
C0  3 (k0s 0 ) 2  (k1s 1 ) 2  C0  (3k0s 0 ) 2  (3k1s 1 ) 2
• Which is always smaller by the triangle inequality
20
Tau 2002
Won’t this be big and slow?
• Naively, adds an N element float vector to all
values
• But, an x% change in a process parameter
generally results in <x% change in value
- Can use a byte value with 1% accuracy
• A given R or C usually depends on a subset
- Just the properties of that layer(s)
• Net result – about 6 extra bytes per value
• Some compute overhead, but avoids multiple runs
21
Tau 2002
Experimental results for explicit part only
• Start with a 0.18 micron, 5LM, 144K net design
• First – is the linear approximation OK?
- Generated 35 cases with –20%,0,+20% variation of
three most relevant parameters for metal-2 layer
- For each lumped C value did coeffgen, then
HyperExtract, then a least-squares fit
- Less than 1% error for C = C0 + k0Dp0+ k1Dp1 + k2Dp2
• Since delay is dominated by C, this means delay
will also be a (near) linear function of process
variation.
22
Tau 2002
More Experimental Results
• Next, how much does it help?
- Varied each parameter (of 17) individually
- Compared to a worst case corner (3 sigma
everywhere)
- Average 7% improvement in prediction of C
• Will expect a bigger improvement for timing
- Since it depends on more parameters, triangle
inequality is (usually) stronger
23
Tau 2002
Conclusions
• Outlined a possible approach for handling
process variation
- Handles explicit and statistical variation
- Theory straightforward in general
‣ Pruning is the hardest part, but there are many alternatives
- Experiments back up assumptions needed
- Memory and compute time should be acceptable
24
Tau 2002