Download Tech Mapping

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Technology Mapping
Outline
– What is Technology Mapping?
– Rule-Based Mapping
– Tree Pattern Matching
Goal
– Understand technology mapping
– Understand mapping algorithms
– Understand mapping issues
What is Technology Mapping?
• Map optimized logic to
primitive cell library
• Library capabilities
– function
– cell size
– cell performance
• Library restrictions
– fan-in
– fan-out
• Goal
–
–
–
–
delay
area
power
etc.
Cell AOI33
- area 3248
- delay 0.8
- power 0.08
Cell Libraries
a
b
• Custom cells
– synthesize functions as needed
– e.g. complementary, domino, or CVSL CMOS
• Standard cells
– library of fixed functions
– use as needed
– e.g. 3-input AND-OR
• Gate arrays
– fixed population of gates and gate types
– must fit design into available gates
• FPGAs
– programmable function blocks
– e.g. all functions of 4 inputs
ab
a
b
Custom Cell Synthesis
• Mechanically transform function to custom cell
•
– complementary CMOS - cell is guaranteed to work
» but it might be slow - e.g. 20-input NAND gate
– avoid problems by limiting functions in synthesis process
» limit fan-in, fan-out of functions
» chop up large functions
ABCDEF => (ABC)(DEF)
A+B+C+D => (A+B)+(C+D)
Apply electrical rules
– size transistors to meet timing goals
– speed up longest paths in circuit
– rules are specific to cell technology
a
b
ab
a
b
in series, so make fatter
speeds up 1->0 transition
Rule-Based Cell Matching
• Library-based transformations
– rules encode library capabilities and restrictions
– transformations improve area, delay, power, etc.
– similar to synthesis via local optimization
• SOCRATES circuit optimizer
– lookahead several rules
– test all applicable rules
» avoid local minima
• Problems
– rules are not guaranteed to find optimum
– rules may not be complete
X
X
Rule-Based Matching Algorithm
TryRule3(circuit, rule)
{
scan circuit for rule match
if match, compute cost
recurse twice more on all rules
return minimum cost
}
do {
mincost = cost;
for (i = 0; i < MAXRULES; i++) {
cost = TryRule3(circuit, rule[i]);
if (cost < mincost) {
mincost = cost;
minrule = rule[i];
}
if (minrule != NULL)
cost = ApplyRule(circuit, minrule);
}
until no rules apply
Complexity (CN)3 for C circuit elements and N rules
Graph Covering
F = abcd
• Logic Equation Representation
– directed acyclic graphs (DAGs)
» 2-input NANDs and inverters in MIS
» representation is not unique
F
NOT
NAND
• Library Cell Representation
– represent cells as DAGs
» 2-input NANDs and inverters in MIS
» all decompositions of a cell function
» ~N! possibilities for N inputs
F
– cell cost (area, delay, power)
a
b
F
d
d
F
b
c
c
a
d
b
•••
a
b
a
c
c
d
Graph Covering
• Algorithm
F = abcd
– minimum-cost cover of equation DAGs with
library DAGs
– NP-complete (Bruno and Sethi 1975)
– same problem as compiler code generation
F
• Approaches
– search from primary inputs
– search from primary outputs
– try largest cell DAGs first
» usually smallest area
» not always fastest
» similar to “maximal munching” in code
generation
– avoid local minima
» lookahead
» several random starting points
a
b
c
d
F
a
b
c
d
Tree Pattern Matching
• Partition circuit DAGs into trees
– split at fan-out nodes
– make only outputs roots of trees
– perform splitting incrementally
» when searching that tree
» stop at already-mapped nodes
• Algorithm
a
c
b
a
a
b
c
b
– find optimal mapping for each output tree
– use recursive graph isomorphism tree matching
» match all cells at root (output)
» find optimum mapping for each subtree (cell input)
» cost is cell plus cost of mappings of cell inputs
– top-down traversal to record cells
– exponential time in worst case
look at top
input to NAND
Tree Pattern Matching
OptimalTree(tree)
{
mincost = INF;
for all cells
if (cell matches at tree.root) {
cost = cell.cost;
for all cell inputs
cost += OptimalTree(cell.input[i]);
if (cost < mincost) {
cost = 5+2+2+5+2+2 = 18
mincost = cost;
keep tree mapping;
}
}
return(mincost);
}
for each output {
outputcost = OptimalTree(output);
scan top-down to get cell mapping;
}
cost = 4+4 = 8
Tree Pattern Matching
• Inject inverter pairs at gate outputs
– increases possible matches
– add fake inverter pair cell to library
» removes remaining inverter pairs from circuit
without inverters
with inverters
Tree Matching Issues
• Tree matching is fast
– simple DFS of circuit and cell trees
• Still many tree representations for a function
• Might miss common subexpressions
– if cell matching does not line up with fan-out nodes
– stop by treating fan-outs as cell outputs
a
a
b
c
d
a
b
c
b
c
b
b
c
d
c
d
2-input ANDs
cost: 15 xistors
3-input ANDs
cost: 16 xistors
Graph Pattern Matching
• Match subgraphs, not trees
– avoid breaking graph into forest of trees
– match more gate types
» XOR, MUX
» multiple-output gates
• Algorithm
– find all circuit subgraph to cell graph matchings
» generate DFS traversal of each cell graph from outputs
» O(C*N) for C circuit nodes, N total cell library nodes
– cover graph with minimum-cost cell graphs
» matrix with row for each circuit node, column for each cell, 1 if cell
matches circuit, 0 otherwise
» find least-cost maximum independent set of rows
» branch-and-bound search algorithm
» bound is least-cost rows found so far
» exponential in worst case
Graph Matching Algorithm
boolean CellMatch(x, y)
if (y.gatetype != x.gatetype) return(0);
i = y.firstchild; j = x.firstchild;
while (i != NULL && j != NULL)
if (!CellMatch(i,j)) return(0);
i = y.nextchild; j = x.nextchild;
if (j == NULL) return(1);
else return(0);
for each node i in circuit graph
for each cell j in library graphs
if (CellMatch(i,j)) match[i][j] = 1
find least-cost maximum set of independent rows
in match array
Minimum-Cost Graph Cover
2
3
4
XOR2 INV NOR2 NAND2
Cost 14
2
4
4
a
0
0
1
0
b
1
0
0
1
c
0
0
0
1
d
0
1
0
0
e
0
0
0
1
f
0
0
0
1
1
5
6
XOR Cell DFS
d
c
e
f
a
b
Circuit DFS
a
b
c
d
e
f
-
fails
match
fails
fails
fails
fails
at
at
at
at
at
at
1
1
2
1
2
4
first solution: a, b, c, d, e
bound = 4 + 4 + 4 + 2 + 4 = 22
least-cost solution: a, b, d
cost = 4 + 14 + 2 = 20
NOR2 at a
XOR2 at b
INV at d
Maximum number of rows - cover most
nodes in fewest cells
Independent - cells do not overlap
Layout-Driven Mapping
• Goal
–
–
–
–
minimize chip area
previous approaches focus on cell area
ignore inter-cell routing
example - high fan-in and fan-out
» minimizes cell area
» takes a lot of routing
• Solution
– estimate placement and routing during mapping
» simple, fast estimates
– incrementally update during mapping
» only mapping a few gates at a time
FPGA Technology Mapping
• Programmable logic blocks
– multiplexor-based (Actel)
– lookup table (Xilinx)
• Problem
–
–
–
–
–
lookup table of K inputs implements
possible
functions
K
2
2
K = 5 typically
impractical to use library cell matching approach
requires 4 billion variations for each cell pattern/tree/graph
similar problem for mux-based FPGAs
• Solutions
– clique partitioning
– bin packing
– OBDD matching
FPGAs
• Xilinx
–
–
–
–
–
RAM configurable logic blocks (CLB)
RAM programmable wiring
2 functions of 4 variables
1 function of 5 variables
implemented via table lookup RAM
• Actel
–
–
–
–
fuse configurable logic elements
fuse programmable wiring
all 2 and 3-variable functions
some 4-variable functions
CLB
SRAM-based FPGA
• Functions are implemented as MUX
• Interconnects are implemented as wire segment
• Interconnect connections are pass transistors
• Advantage: CMOS
• Disadvantage: slow
• Other issues:
– Need re-program each time power is on