Download ECE 260B - CSE241A VLSI Digital Circuits

Document related concepts

Hypertext Transfer Protocol wikipedia , lookup

Transcript
ECE260B – CSE241A
Winter 2005
Logic Synthesis
Website: http://vlsicad.ucsd.edu/courses/ece260b-w05
ECE 260B – CSE 241A Static Timing Analysis 1
Slides courtesy of Dr. Cho Moon
http://vlsicad.ucsd.edu
Introduction
 Why logic synthesis?



Ubiquitous – used almost everywhere VLSI is done
Body of useful and general techniques – same solutions can be
used for different problems
Foundation for many applications such as
-
Formal verification
ATPG
Timing analysis
Sequential optimization
ECE 260B – CSE 241A Static Timing Analysis 2
http://vlsicad.ucsd.edu
RTL Design Flow
HDL
RTL
Synthesis
Manual
Design
Module
Generators
netlist
Library
0
b
1
s
Logic
Synthesis
netlist
a
a
0
b
1
s
d
d
q
clk
q
clk
Physical
Synthesis
layout
ECE 260B – CSE 241A Static Timing Analysis 3
Slide courtesy of Devadas, et. al
http://vlsicad.ucsd.edu
Logic Synthesis Problem
 Given


Initial gate-level netlist
Design constraints
- Input arrival times, output required times, power consumption, noise
immunity, etc…

Target technology libraries
 Produce

Smaller, faster or cooler gate-level netlist that meets constraints
Very hard optimization problem!
ECE 260B – CSE 241A Static Timing Analysis 4
http://vlsicad.ucsd.edu
Combinational Logic Synthesis
2-level
Logic opt
netlist
tech
independent
Library
Logic
Synthesis
multilevel
Logic opt
tech
dependent
Library
netlist
ECE 260B – CSE 241A Static Timing Analysis 5
Slide courtesy of Devadas, et. al
http://vlsicad.ucsd.edu
Outline
 Introduction
 Two-level Logic Synthesis
 Multi-level Logic Synthesis
 Timing Optimization in Synthesis
ECE 260B – CSE 241A Static Timing Analysis 6
http://vlsicad.ucsd.edu
Two-level Logic Synthesis Problem
 Given an arbitrary logic function in two-level form,
produce a smaller representation.
 For sum-of-products (SOP) implementation on PLAs,
fewer product terms and fewer inputs to each product
term mean smaller area.
I1
F=AB+ABC
I2
F=AB
O1
O2
ECE 260B – CSE 241A Static Timing Analysis 7
http://vlsicad.ucsd.edu
Boolean Functions
f(x) : Bn
B
B = {0, 1}, x = (x1, x2, …, xn)





x1, x2, … are variables
x1, x1, x2, x2, … are literals
each vertex of Bn is mapped to 0 or 1
the onset of f is a set of input values for which f(x) = 1
the offset of f is a set of input values for which f(x) = 0
ECE 260B – CSE 241A Static Timing Analysis 8
http://vlsicad.ucsd.edu
Logic Functions:
ECE 260B – CSE 241A Static Timing Analysis 9
Slide courtesy of Devadas, et. al
http://vlsicad.ucsd.edu
Cube Representation
ECE 260B – CSE 241A Static Timing Analysis 10
Slide courtesy of Devadas, et. al
http://vlsicad.ucsd.edu
Sum-of-products (SOP)

A function can be represented by a sum of cubes
(products):
f = ab + ac + bc
Since each cube is a product of literals, this is a “sum of
products” representation

A SOP can be thought of as a set of cubes F
F = {ab, ac, bc} = C

A set of cubes that represents f is called a cover of f.
F={ab, ac, bc} is a cover of
f = ab + ac + bc.
ECE 260B – CSE 241A Static Timing Analysis 11
http://vlsicad.ucsd.edu
Prime Cover
 A cube is prime if there is no other cube that contains it
(for example, b c is not a prime but b is)
 A cover is prime iff all of its cubes are prime
c
b
a
ECE 260B – CSE 241A Static Timing Analysis 12
http://vlsicad.ucsd.edu
Irredundant Cube
 A cube of a cover C is irredundant if C fails to be a cover
if c is dropped from C
 A cover is irredundant iff all its cubes are irredudant (for
exmaple, F = a b + a c + b c)
c
b
a
ECE 260B – CSE 241A Static Timing Analysis 13
Not covered
http://vlsicad.ucsd.edu
Quine-McCluskey Method
 We want to find a minimum prime and irredundant cover
for a given function.


Prime cover leads to min number of inputs to each product term.
Min irredundant cover leads to min number of product terms.
 Quine-McCluskey (QM) method (1960’s) finds a
minimum prime and irredundant cover.




Step 1: List all minterms of on-set: O(2^n) n = #inputs
Step 2: Find all primes: O(3^n) n = #inputs
Step 3: Construct minterms vs primes table
Step 4: Find a min set of primes that covers all the minterms:
O(2^m) m = #primes
ECE 260B – CSE 241A Static Timing Analysis 14
http://vlsicad.ucsd.edu
QM Example (Step 1)
 F = a’ b’ c’ + a b’ c’ + a b’ c + a b c + a’ b c
 List all on-set minterms
Minterms
a’ b’ c’
a b’ c’
a b’ c
abc
a’ b c
ECE 260B – CSE 241A Static Timing Analysis 15
http://vlsicad.ucsd.edu
QM Example (Step 2)
 F = a’ b’ c’ + a b’ c’ + a b’ c + a b c + a’ b c
 Find all primes.
primes b’ c’ a b’
ac bc
ECE 260B – CSE 241A Static Timing Analysis 16
http://vlsicad.ucsd.edu
QM Example (Step 3)
 F = a’ b’ c’ + a b’ c’ + a b’ c + a b c + a’ b c
 Construct minterms vs primes table (prime implicant
table) by determining which cube is contained in which
prime. X at row i, colum j means that cube in row i is
contained by prime in column j.
b’ c’
a’ b’ c’
X
a b’ c’
X
a b’ c
abc
a’ b c
ECE 260B – CSE 241A Static Timing Analysis 17
a b’
ac
bc
X
X
X
X
X
X
http://vlsicad.ucsd.edu
QM Example (Step 4)
 F = a’ b’ c’ + a b’ c’ + a b’ c + a b c + a’ b c

Find a minimum set of primes that covers all the minterms
“Minimum column covering problem”
b’ c’
a’ b’ c’
X
a b’ c’
X
a b’ c
a b’
ac
bc
X
X
abc
X
X
a’ b c
X
X
Essential primes
ECE 260B – CSE 241A Static Timing Analysis 18
http://vlsicad.ucsd.edu
ESPRESSO – Heuristic Minimizer
 Quine-McCluskey gives a minimum solution but is only
good for functions with small number of inputs (< 10)
 ESPRESSO is a heuristic two-level minimizer that finds a
“minimal” solution
ESPRESSO(F) {
do {
reduce(F);
expand(F);
irredundant(F);
} while (fewer terms in F);
verfiy(F);
}
ECE 260B – CSE 241A Static Timing Analysis 19
http://vlsicad.ucsd.edu
ESPRESSO ILLUSTRATED
Reduce
Expand
Irredundant
ECE 260B – CSE 241A Static Timing Analysis 20
http://vlsicad.ucsd.edu
Outline
 Introduction
 Two-level Logic Synthesis
 Multi-level Logic Synthesis
 Timing optimization in Synthesis
ECE 260B – CSE 241A Static Timing Analysis 21
http://vlsicad.ucsd.edu
Multi-level Logic Synthesis
 Two-level logic synthesis is effective and mature
 Two-level logic synthesis is directly applicable to PLAs
and PLDs
But…
 There are many functions that are too expensive to
implement in two-level forms (too many product terms!)
 Two-level implementation constrains layout (AND-plane,
OR-plane)
 Rule of thumb:


Two-level logic is good for control logic
Multi-level logic is good for datapath or random logic
ECE 260B – CSE 241A Static Timing Analysis 22
http://vlsicad.ucsd.edu
Two-Level (PLA) vs. Multi-Level
PLA
Multi-level
control logic
all logic
constrained layout
general
highly automatic
automatic
technology independent
partially technology independent
multi-valued logic
coming
slower?
input, output, state encoding
ECE 260B – CSE 241A Static Timing Analysis 23
can be high speed
some results
http://vlsicad.ucsd.edu
Multi-level Logic Synthesis Problem
 Given


Initial Boolean network
Design constraints
- Arrival times, required times, power consumption, noise immunity,
etc…

Target technology libraries
 Produce

a minimum area netlist consisting of the gates from the target
libraries such that design constraints are satisfied
ECE 260B – CSE 241A Static Timing Analysis 24
http://vlsicad.ucsd.edu
Modern Approach to Logic Optimization
 Divide logic optimization into two subproblems:

Technology-independent optimization
- determine overall logic structure
- estimate costs (mostly) independent of
technology
- simplified cost modeling

Technology-dependent optimization
(technology mapping)
- binding onto the gates in the library
- detailed technology-specific cost model
 Orchestration of various optimization/transformation
techniques for each subproblem
ECE 260B – CSE 241A Static Timing Analysis 25
Slide courtesy of Keutzer
http://vlsicad.ucsd.edu
Optimization Cost Criteria
The accepted optimization criteria for multi-level logic
are to minimize some function of:
1.
2.
3.
4.
5.
6.
Area occupied by the logic gates and interconnect
(approximated by literals = transistors in technology
independent optimization)
Critical path delay of the longest path through the logic
Degree of testability of the circuit, measured in terms of the
percentage of faults covered by a specified set of test vectors
for an approximate fault model (e.g. single or multiple stuckat faults)
Power consumed by the logic gates
Noise Immunity
Wireability
while simultaneously satisfying upper or lower bound
constraints placed on these physical quantities
ECE 260B – CSE 241A Static Timing Analysis 26
http://vlsicad.ucsd.edu
Representation: Boolean Network
Boolean network:
•
directed acyclic graph (DAG)
•
node logic function representation
fj(x,y)
node variable yj: yj= fj(x,y)
edge (i,j) if fj depends explicitly on
yi
•
•
Inputs x = (x1, x2,…,xn )
Outputs z = (z1, z2,…,zp )
ECE 260B – CSE 241A Static Timing Analysis 27
Slide courtesy of Brayton
http://vlsicad.ucsd.edu
Network Representation
Boolean network:
ECE 260B – CSE 241A Static Timing Analysis 28
http://vlsicad.ucsd.edu
Node Representation: Sum of Products (SOP)
Example:
abc’+a’bd+b’d’+b’e’f (sum of cubes)
Advantages:
• easy to manipulate and minimize
• many algorithms available (e.g. AND, OR, TAUTOLOGY)
•
two-level theory applies
Disadvantages:
• Not representative of logic complexity. For example
f=ad+ae+bd+be+cd+ce
f’=a’b’c’+d’e’
•
These differ in their implementation by an inverter.
hence not easy to estimate logic; difficult to estimate progress during
logic manipulation
ECE 260B – CSE 241A Static Timing Analysis 29
http://vlsicad.ucsd.edu
Factored Forms
Example: (ad+b’c)(c+d’(e+ac’))+(d+e)fg
Advantages
• good representative of logic complexity
f=ad+ae+bd+be+cd+ce f’=a’b’c’+d’e’ 
f=(a+b+c)(d+e)
• in many designs (e.g. complex gate CMOS) the implementation
of a function corresponds directly to its factored form
• good estimator of logic implementation complexity
• doesn’t blow up easily
Disadvantages
• not as many algorithms available for manipulation
• hence usually just convert into SOP before manipulation
ECE 260B – CSE 241A Static Timing Analysis 30
http://vlsicad.ucsd.edu
Factored Forms
Note:
literal count

transistor count  area
(however, area also depends on wiring)
ECE 260B – CSE 241A Static Timing Analysis 31
http://vlsicad.ucsd.edu
Factored Forms
Definition : a factored form can be defined recursively by the
following rules. A factored form is either a product or sum where:
• a product is either a single literal or product of factored forms
• a sum is either a single literal or sum of factored forms
A factored form is a parenthesized algebraic expression.
In effect a factored form is a product of sums of products … or a
sum of products of sums …
Any logic function can be represented by a factored form, and any
factored form is a representation of some logic function.
ECE 260B – CSE 241A Static Timing Analysis 32
http://vlsicad.ucsd.edu
Factored Forms
When measured in terms of number of inputs, there are functions whose size is
exponential in sum of products representation, but polynomial in factored form.
Example: Achilles’ heel function
i n / 2
 (x
2 i 1
 x2 i )
i 1
There are n literals in the factored form and (n/2)2n/2 literals in the SOP form.
Factored forms are useful in estimating area and
delay in a multi-level synthesis and optimization
system.
In many design styles (e.g. complex gate CMOS
design) the implementation of a function
corresponds directly to its factored form.
ECE 260B – CSE 241A Static Timing Analysis 33
http://vlsicad.ucsd.edu
Factored Forms
Factored forms cam be graphically represented as labeled trees, called
factoring trees, in which each internal node including the root is labeled
with either + or , and each leaf has a label of either a variable or its
complement.
Example: factoring tree of ((a’+b)cd+e)(a+b’)+e’
ECE 260B – CSE 241A Static Timing Analysis 34
http://vlsicad.ucsd.edu
Reduced Ordered BDDs
•
•
•
•
•
•
like factored form, represents both function and
complement
like network of muxes, but restricted since
controlled by primary input variables
- not really a good estimator for
implementation complexity
given an ordering, reduced BDD is canonical,
hence a good replacement for truth tables
for a good ordering, BDDs remain reasonably
small for complicated functions (e.g. not
multipliers)
manipulations are well defined and efficient
true support (dependency) is displayed
ECE 260B – CSE 241A Static Timing Analysis 35
http://vlsicad.ucsd.edu
Technology-Independent Optimization
 Technology-independent optimization is a bag of tricks:


Two-level minimization (also called simplify)
Constant propagation (also called sweep)
f = a b + c; b = 1 => f = a + c

Decomposition (single function)
f = abc+abd+a’c’d’+b’c’d’ => f = xy + x’y’; x = ab ; y = c+d

Extraction (multiple functions)
f = (az+bz’)cd+e
g = (az+bz’)e’
h = cde

f = xy+e g = xe’
ECE 260B – CSE 241A Static Timing Analysis 36
h = ye x = az+bz’
y = cd
http://vlsicad.ucsd.edu
More Technology-Independent Optimization
 More technology-independent optimization tricks:

Substitution
g = a+b f = a+bc

f = g(a+c)

Collapsing (also called elimination)
f = ga+g’b

f = ac+ad+bc’d’

g = c+d
g = c+d
Factoring (series-parallel decomposition)
f = ac+ad+bc+bd+e => f = (a+b)(c+d)+e
ECE 260B – CSE 241A Static Timing Analysis 37
http://vlsicad.ucsd.edu
Summary of Typical Recipe for TI Optimization
 Propagate constants
 Simplify: two-level minimization at Boolean network node
 Decomposition
 Local “Boolean” optimizations

Boolean techniques exploit Boolean identities (e.g., a a’ = 0)
Consider f = a b’ + a c’ + b a’ + b c’ + c a’ + c b’

Algebraic factorization procedures
f = a (b’ + c’) + a’ (b + c) + b c’ + c b’

Boolean factorization procedures
f = (a + b + c) (a’ + b’ + c’)
ECE 260B – CSE 241A Static Timing Analysis 38
Slide courtesy of Keutzer
http://vlsicad.ucsd.edu
Technology-Dependent Optimization
Technology-dependent optimization consists of
 Technology mapping: maps Boolean network to a set of
gates from technology libraries
 Local transformations




Discrete resizing
Cloning
Fanout optimization (buffering)
Logic restructuring
ECE 260B – CSE 241A Static Timing Analysis 39
Slide courtesy of Keutzer
http://vlsicad.ucsd.edu
Technology Mapping
Input


Technology independent, optimized logic network
Description of the gates in the library with their cost
Output

Netlist of gates (from library) which minimizes total cost
General Approach




Construct a subject DAG for the network
Represent each gate in the target library by pattern DAG’s
Find an optimal-cost covering of subject DAG using the
collection of pattern DAG’s
Canonical form: 2-input NAND gates and inverters
ECE 260B – CSE 241A Static Timing Analysis 40
http://vlsicad.ucsd.edu
DAG Covering
 DAG covering is an NP-hard problem
 Solve the sub-problem optimally



Partition DAG into a forest of trees
Solve each tree optimally using tree covering
Stitch trees back together
ECE 260B – CSE 241A Static Timing Analysis 41
Slide courtesy of Keutzer
http://vlsicad.ucsd.edu
Tree Covering Algorithm
 Transform netlist and libraries into canonical forms

2-input NANDs and inverters
 Visit each node in BFS from inputs to outputs

Find all candidate matches at each node N
- Match is found by comparing topology only (no need to compare
functions)

Find the optimal match at N by computing the new cost
- New cost = cost of match at node N + sum of costs for matches at
children of N

Store the optimal match at node N with cost
 Optimal solution is guaranteed if cost is area
 Complexity = O(n) where n is the number of nodes in
netlist
ECE 260B – CSE 241A Static Timing Analysis 42
http://vlsicad.ucsd.edu
Tree Covering Example
Find an ``optimal’’ (in area, delay, power) mapping of this circuit
into the technology library (simple example below)
ECE 260B – CSE 241A Static Timing Analysis 43
Slide courtesy of Keutzer
http://vlsicad.ucsd.edu
Elements of a library - 1
Element/Area Cost
INVERTER
2
NAND2
3
NAND3
4
NAND4
5
ECE 260B – CSE 241A Static Timing Analysis 44
Tree Representation (normal form)
Slide courtesy of Keutzer
http://vlsicad.ucsd.edu
Trivial Covering
subject DAG
7
5
NAND2 (3) = 21
INV
(2) = 10
Area cost 31
Can we do better with tree covering?
ECE 260B – CSE 241A Static Timing Analysis 45
Slide courtesy of Keutzer
http://vlsicad.ucsd.edu
Optimal tree covering - 1
3
2
2
``subject tree’’
3
ECE 260B – CSE 241A Static Timing Analysis 46
Slide courtesy of Keutzer
http://vlsicad.ucsd.edu
Optimal tree covering - 2
3
8
2
2
5
3
``subject tree’’
ECE 260B – CSE 241A Static Timing Analysis 47
Slide courtesy of Keutzer
http://vlsicad.ucsd.edu
Optimal tree covering - 3
3
8
13
2
2
5
3
``subject tree’’
Cover with ND2 or ND3 ?
1 NAND2
+ subtree
3
5
1 NAND3
=4
Area cost 8
ECE 260B – CSE 241A Static Timing Analysis 48
Slide courtesy of Keutzer
http://vlsicad.ucsd.edu
Optimal tree covering – 3b
3
8
13
2
2
5
4
3
``subject tree’’
Label the root of the sub-tree with optimal match and cost
ECE 260B – CSE 241A Static Timing Analysis 49
Slide courtesy of Keutzer
http://vlsicad.ucsd.edu
Optimal tree covering - 4
Cover with INV or AO21 ?
3
8
13
2
2
2
5
``subject tree’’
4
1 Inverter
+ subtree
2
13
Area cost 15
ECE 260B – CSE 241A Static Timing Analysis 50
1 AO21
+ subtree 1
+ subtree 2
4
3
2
Area cost 9
Slide courtesy of Keutzer
http://vlsicad.ucsd.edu
Optimal tree covering – 4b
3
8
9
13
2
2
2
5
``subject tree’’
4
Label the root of the sub-tree with optimal match and cost
ECE 260B – CSE 241A Static Timing Analysis 51
Slide courtesy of Keutzer
http://vlsicad.ucsd.edu
Optimal tree covering - 5
9
Cover with ND2 or ND3 ?
8
2
``subject tree’’
4
NAND2
subtree 1
subtree 2
1 NAND2
Area cost 16
ECE 260B – CSE 241A Static Timing Analysis 52
9
4
3
subtree 1
subtree 2
subtree 3
1 NAND3
8
2
4
4
NAND3
Area cost 18
Slide courtesy of Keutzer
http://vlsicad.ucsd.edu
Optimal tree covering – 5b
9
8
16
2
``subject tree’’
4
Label the root of the sub-tree with optimal match and cost
ECE 260B – CSE 241A Static Timing Analysis 53
Slide courtesy of Keutzer
http://vlsicad.ucsd.edu
Optimal tree covering - 6
Cover with INV or AOI21 ?
13
16
``subject tree’’
5
INV
subtree 1
1 INV
16
2
AOI21
subtree 1
subtree 2
1 AOI21
Area cost 18
ECE 260B – CSE 241A Static Timing Analysis 54
13
5
4
Area cost 22
Slide courtesy of Keutzer
http://vlsicad.ucsd.edu
Optimal tree covering – 6b
13
18
16
``subject tree’’
5
Label the root of the sub-tree with optimal match and cost
ECE 260B – CSE 241A Static Timing Analysis 55
Slide courtesy of Keutzer
http://vlsicad.ucsd.edu
Optimal tree covering - 7
Cover with ND2 or ND3 or ND4 ?
``subject tree’’
ECE 260B – CSE 241A Static Timing Analysis 56
Slide courtesy of Keutzer
http://vlsicad.ucsd.edu
Cover 1 - NAND2
Cover with ND2 ?
9
18
16
``subject tree’’
4
subtree 1
subtree 2
1 NAND2
18
0
3
Area cost 21
ECE 260B – CSE 241A Static Timing Analysis 57
Slide courtesy of Keutzer
http://vlsicad.ucsd.edu
Cover 2 - NAND3
Cover with ND3?
9
``subject tree’’
4
subtree 1
subtree 2
subtree 3
1 NAND3
9
4
0
4
Area cost 17
ECE 260B – CSE 241A Static Timing Analysis 58
Slide courtesy of Keutzer
http://vlsicad.ucsd.edu
Cover - 3
Cover with ND4 ?
8
2
4
``subject tree’’
subtree 1
subtree 2
subtree 3
subtree 4
1 NAND4
ECE 260B – CSE 241A Static Timing Analysis 59
8
2
4
0
5
Slide courtesy of Keutzer
Area cost 19
http://vlsicad.ucsd.edu
Optimal Cover was Cover 2
ND2
AOI21
ND3
INV
``subject tree’’
ND3
INV
ND2
2 ND3
AOI21
2
3
8
4
Area cost 17
ECE 260B – CSE 241A Static Timing Analysis 60
Slide courtesy of Keutzer
http://vlsicad.ucsd.edu
Summary of Technology Mapping
 DAG covering formulation

Separated library issues from mapping algorithm (can’t do this
with rule-based systems)
 Tree covering approximation


Very efficient (linear time)
Applicable to wide range of libraries (std cells, gate arrays) and
technologies (FPGAs, CPLDs)
 Weaknesses


Problems with DAG patterns (Multiplexors, full adders, …)
Large input gates lead to a large number of patterns
ECE 260B – CSE 241A Static Timing Analysis 61
http://vlsicad.ucsd.edu
Outline
 Introduction
 Two-level Logic Synthesis
 Multi-level Logic Synthesis
 Timing optimization in Synthesis
ECE 260B – CSE 241A Static Timing Analysis 62
http://vlsicad.ucsd.edu
Timing Optimization in Synthesis
Factors determining delay of circuit:
 Underlying circuit technology



Circuit type (e.g. domino, static CMOS, etc.)
Gate type
Gate size
 Logical structure of circuit



Length of computation paths
False paths
Buffering
 Parasitics


Wire loads
Layout
ECE 260B – CSE 241A Static Timing Analysis 63
http://vlsicad.ucsd.edu
Problem Statement
Given:



Initial circuit function description
Library of primitive functions
Performance constraints (arrival/required times)
Generate:
an implementation of the circuit using the primitive
functions, such that:
1.
2.
performance constraints are met
circuit area is minimized
ECE 260B – CSE 241A Static Timing Analysis 64
http://vlsicad.ucsd.edu
Current Design Process
Behavioral description
Logic and latches
Logic equations
•Gate library
•Perf. Constraints
•Delay models
Gate netlist
ECE 260B – CSE 241A Static Timing Analysis 65
Layout
Behavior
Optiization
(scheduling)
Partitioning
(retiming)
Logic synthesis
•Technology independent
•Technology mapping
Timing driven
place and route
http://vlsicad.ucsd.edu
Synthesis delay models
Why are technology independent delay reductions hard?
b
e
t
t
e
r
Lack of fast and accurate delay models
1.
# levels, fast but crude
s
l 2. # levels + correction term (fanout, wires,… ): a little better,
but still crude (what coefficients to use?)
o
w 3. Technology mapped: reasonable, but very slow
e 4. Place and route: better but extremely slow
r 5. Silicon: best, but infeasibly slow (except for FPGAs)
ECE 260B – CSE 241A Static Timing Analysis 66
http://vlsicad.ucsd.edu
Clustering/partial-collapse
Traditional critical-path based methods require
 Well defined critical path
 Good delay/slack information
Problems:


Good delay information comes from mapper and layout
Delay estimates and models are weak
Possible solutions:

Better delay modeling at technology independent level

Make speedup, insensitive to actual critical paths and
mapped delays
ECE 260B – CSE 241A Static Timing Analysis 67
http://vlsicad.ucsd.edu
Overview of Solutions for delay
1. Circuit re-structuring

Rescheduling operations to reduce time of computation
2. Implementation of function trees
(technology mapping)


Selection of gates from library
-
Minimum delay (load
independent model)
-
Minimize delay and area
Implementation of buffer trees
3. Resizing
Focus here on circuit re-structuring
ECE 260B – CSE 241A Static Timing Analysis 68
http://vlsicad.ucsd.edu
Circuit re-structuring
Approaches:
Local:
 Mimic optimization techniques in adders



Carry lookahead (tree height reduction)
Conditional sum (generalized select transformation)
Carry bypass (generalized bypass transformation)
Global:
 Reduce depth of entire circuit


Partial collapsing
Boolean simplification
ECE 260B – CSE 241A Static Timing Analysis 69
http://vlsicad.ucsd.edu
Re-structuring methods
Performance measured by
1.
2.
3.

Level based optimizations:




levels,
sensitizable paths,
technology dependent delays
Tree height reduction (Singh ‘88)
Partial collapsing and simplification (Touati ‘91)
Generalized select transform (Berman ‘90)
Sensitizable paths

Generalized bypass transform (Mcgeer ‘91)
ECE 260B – CSE 241A Static Timing Analysis 70
http://vlsicad.ucsd.edu
Re-structuring for delay: tree-height reduction
6
Collapsed
Critical region
5 n 5
1
0
a
l
i
Critical
m 1 region
4
k
n’
1
j 3
h
0 0 0
5
0
i
0
2
1
m 1
4
j 3
Duplicated
logic
k
h
2 0
b c d e f
ECE 260B – CSE 241A Static Timing Analysis 71
0
g
0
a
0
b
0 0
2 0
c d e f
0
g
http://vlsicad.ucsd.edu
Restructuring for delay: path reduction
5
Collapsed
Critical region
n’
1
0
i
4
n’
3
m
1
2
4
0
j 3
1
a
0
b
0 0
Duplicated
logic
1
i
c d e f
ECE 260B – CSE 241A Static Timing Analysis 72
0
g
0
a
m 1
2
4
k
0
j 3
0
h
1
1
k
2 0
5
2
h
0
New delay = 5
0
b
0 0
2 0
c d e f
0
g
http://vlsicad.ucsd.edu
Generalized select transform (GST)
Late signal feeds multiplexor
a
out
b
c
d
e
f
g
a=0
0
b
a=1
c
d
e
f
g
out
1
b
c
d
ECE 260B – CSE 241A Static Timing Analysis 73
e
f
g
a
http://vlsicad.ucsd.edu
Generalized bypass transform (GBX)
 Make critical path false

Speed up the circuit
 Bypass logic of critical path(s)
fm=f
fm+1
fm =f
fm+1
Boolean
difference
ECE 260B – CSE 241A Static Timing Analysis 74
… fn=g
… fn=g
dg
__
df
0
g’
1
s-a-0 redundant
http://vlsicad.ucsd.edu
c
0/1
GST vs GBX
g
…
b
0
g’
1
a
h
GBX
Boolean
difference =
h
 ha  h a
a
a=1
b
b
a=0
b
a=1
b
d
e
f
g
c
d
e
f
g
0 out
d
e
f
g
c
d
e
f
g
ECE 260B – CSE 241A Static Timing Analysis 75
0
g’
1
a
c
c
g
…
b
Note:
a=0
c
0/1
dh
__
da
GBX
1
a
GST
http://vlsicad.ucsd.edu
h
Thank you!
ECE 260B – CSE 241A Static Timing Analysis 76
http://vlsicad.ucsd.edu