Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Technology Mapping Outline – What is Technology Mapping? – Rule-Based Mapping – Tree Pattern Matching Goal – Understand technology mapping – Understand mapping algorithms – Understand mapping issues What is Technology Mapping? • Map optimized logic to primitive cell library • Library capabilities – function – cell size – cell performance • Library restrictions – fan-in – fan-out • Goal – – – – delay area power etc. Cell AOI33 - area 3248 - delay 0.8 - power 0.08 Cell Libraries a b • Custom cells – synthesize functions as needed – e.g. complementary, domino, or CVSL CMOS • Standard cells – library of fixed functions – use as needed – e.g. 3-input AND-OR • Gate arrays – fixed population of gates and gate types – must fit design into available gates • FPGAs – programmable function blocks – e.g. all functions of 4 inputs ab a b Custom Cell Synthesis • Mechanically transform function to custom cell • – complementary CMOS - cell is guaranteed to work » but it might be slow - e.g. 20-input NAND gate – avoid problems by limiting functions in synthesis process » limit fan-in, fan-out of functions » chop up large functions ABCDEF => (ABC)(DEF) A+B+C+D => (A+B)+(C+D) Apply electrical rules – size transistors to meet timing goals – speed up longest paths in circuit – rules are specific to cell technology a b ab a b in series, so make fatter speeds up 1->0 transition Rule-Based Cell Matching • Library-based transformations – rules encode library capabilities and restrictions – transformations improve area, delay, power, etc. – similar to synthesis via local optimization • SOCRATES circuit optimizer – lookahead several rules – test all applicable rules » avoid local minima • Problems – rules are not guaranteed to find optimum – rules may not be complete X X Rule-Based Matching Algorithm TryRule3(circuit, rule) { scan circuit for rule match if match, compute cost recurse twice more on all rules return minimum cost } do { mincost = cost; for (i = 0; i < MAXRULES; i++) { cost = TryRule3(circuit, rule[i]); if (cost < mincost) { mincost = cost; minrule = rule[i]; } if (minrule != NULL) cost = ApplyRule(circuit, minrule); } until no rules apply Complexity (CN)3 for C circuit elements and N rules Graph Covering F = abcd • Logic Equation Representation – directed acyclic graphs (DAGs) » 2-input NANDs and inverters in MIS » representation is not unique F NOT NAND • Library Cell Representation – represent cells as DAGs » 2-input NANDs and inverters in MIS » all decompositions of a cell function » ~N! possibilities for N inputs F – cell cost (area, delay, power) a b F d d F b c c a d b ••• a b a c c d Graph Covering • Algorithm F = abcd – minimum-cost cover of equation DAGs with library DAGs – NP-complete (Bruno and Sethi 1975) – same problem as compiler code generation F • Approaches – search from primary inputs – search from primary outputs – try largest cell DAGs first » usually smallest area » not always fastest » similar to “maximal munching” in code generation – avoid local minima » lookahead » several random starting points a b c d F a b c d Tree Pattern Matching • Partition circuit DAGs into trees – split at fan-out nodes – make only outputs roots of trees – perform splitting incrementally » when searching that tree » stop at already-mapped nodes • Algorithm a c b a a b c b – find optimal mapping for each output tree – use recursive graph isomorphism tree matching » match all cells at root (output) » find optimum mapping for each subtree (cell input) » cost is cell plus cost of mappings of cell inputs – top-down traversal to record cells – exponential time in worst case look at top input to NAND Tree Pattern Matching OptimalTree(tree) { mincost = INF; for all cells if (cell matches at tree.root) { cost = cell.cost; for all cell inputs cost += OptimalTree(cell.input[i]); if (cost < mincost) { cost = 5+2+2+5+2+2 = 18 mincost = cost; keep tree mapping; } } return(mincost); } for each output { outputcost = OptimalTree(output); scan top-down to get cell mapping; } cost = 4+4 = 8 Tree Pattern Matching • Inject inverter pairs at gate outputs – increases possible matches – add fake inverter pair cell to library » removes remaining inverter pairs from circuit without inverters with inverters Tree Matching Issues • Tree matching is fast – simple DFS of circuit and cell trees • Still many tree representations for a function • Might miss common subexpressions – if cell matching does not line up with fan-out nodes – stop by treating fan-outs as cell outputs a a b c d a b c b c b b c d c d 2-input ANDs cost: 15 xistors 3-input ANDs cost: 16 xistors Graph Pattern Matching • Match subgraphs, not trees – avoid breaking graph into forest of trees – match more gate types » XOR, MUX » multiple-output gates • Algorithm – find all circuit subgraph to cell graph matchings » generate DFS traversal of each cell graph from outputs » O(C*N) for C circuit nodes, N total cell library nodes – cover graph with minimum-cost cell graphs » matrix with row for each circuit node, column for each cell, 1 if cell matches circuit, 0 otherwise » find least-cost maximum independent set of rows » branch-and-bound search algorithm » bound is least-cost rows found so far » exponential in worst case Graph Matching Algorithm boolean CellMatch(x, y) if (y.gatetype != x.gatetype) return(0); i = y.firstchild; j = x.firstchild; while (i != NULL && j != NULL) if (!CellMatch(i,j)) return(0); i = y.nextchild; j = x.nextchild; if (j == NULL) return(1); else return(0); for each node i in circuit graph for each cell j in library graphs if (CellMatch(i,j)) match[i][j] = 1 find least-cost maximum set of independent rows in match array Minimum-Cost Graph Cover 2 3 4 XOR2 INV NOR2 NAND2 Cost 14 2 4 4 a 0 0 1 0 b 1 0 0 1 c 0 0 0 1 d 0 1 0 0 e 0 0 0 1 f 0 0 0 1 1 5 6 XOR Cell DFS d c e f a b Circuit DFS a b c d e f - fails match fails fails fails fails at at at at at at 1 1 2 1 2 4 first solution: a, b, c, d, e bound = 4 + 4 + 4 + 2 + 4 = 22 least-cost solution: a, b, d cost = 4 + 14 + 2 = 20 NOR2 at a XOR2 at b INV at d Maximum number of rows - cover most nodes in fewest cells Independent - cells do not overlap Layout-Driven Mapping • Goal – – – – minimize chip area previous approaches focus on cell area ignore inter-cell routing example - high fan-in and fan-out » minimizes cell area » takes a lot of routing • Solution – estimate placement and routing during mapping » simple, fast estimates – incrementally update during mapping » only mapping a few gates at a time FPGA Technology Mapping • Programmable logic blocks – multiplexor-based (Actel) – lookup table (Xilinx) • Problem – – – – – lookup table of K inputs implements possible functions K 2 2 K = 5 typically impractical to use library cell matching approach requires 4 billion variations for each cell pattern/tree/graph similar problem for mux-based FPGAs • Solutions – clique partitioning – bin packing – OBDD matching FPGAs • Xilinx – – – – – RAM configurable logic blocks (CLB) RAM programmable wiring 2 functions of 4 variables 1 function of 5 variables implemented via table lookup RAM • Actel – – – – fuse configurable logic elements fuse programmable wiring all 2 and 3-variable functions some 4-variable functions CLB SRAM-based FPGA • Functions are implemented as MUX • Interconnects are implemented as wire segment • Interconnect connections are pass transistors • Advantage: CMOS • Disadvantage: slow • Other issues: – Need re-program each time power is on