Download Presentation slides. - Dept. of Electrical and Computer Engineering

Generalized Buffering of PTL Logic Stages using Boolean Division and Don’t Cares Rajesh Garg Sunil P. Khatri Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843 1 Outline  Introduction  Objective  Previous Work  CODCs and ACODCs  Generalized Buffering With CODCs  Results  Conclusions 2 Introduction    Pass Transistor Logic (PTL) typically used for specific circuit implementations, like barrel shifters No widely accepted PTL design methodology There exists a direct mapping between an ROBDD node and a PTL mux f f v v fv 1 fv’ ROBDD Node  f 0 fv MUX fv’ v’ v fv fv’ PTL based MUX Hence ROBDDs can be used to perform PTL based synthesis for general circuits 3 Introduction (contd)  Problems with direct mapping  Body Effect   Cannot connect more than 4-5 devices in series Monolithic ROBDDs Worst-case exponential size in number of the inputs (large)  Memory explosion can occur during ROBDD construction   Partitioned ROBDDs    Avoids memory explosion of monolithic ROBDDs Output of each PTL structure needs to be buffered Regenerate electrical drive capability after 4 or 5 levels using a pair of inverters (avoid body effect problems) 4 Objective  New PTL Synthesis Approach  Use partitioned ROBDDs Avoid memory explosion  Guarantee no more than 4-5 series devices   Use generalized buffering Buffers can be complex logic gates in general (not simple inverters/ buffers)  Use ACODCs or CODCs to improve the extraction of generalized buffers  Simplifies the logic function of the PTL block   Boolean Division based formulation Elegant, powerful formulation to extract generalized buffers  Augmented with CODC / ACODCs   Reduces total circuit delay and area 5 Previous Work  Buch et. al, “Logic synthesis for large pass transistor circuits”, in Proceedings, IEEE/ACM ICCAD, Nov 1997, pp 663-670  Lai et. al, “BDD decomposition for mixed CMOS/PTL logic circuit synthesis”, in Proceedings, IEEE ISCAS, May 2005, pp. 5649-5652  Yamashita et. al, “Pass-transistor/CMOS collaborated logic: The best of both worlds”, in Digest of Technical Papers, Symposium on VLSI Circuits, June 1997, pp. 31-32.  Garg et. al, “Generalized buffering of PTL logic stages using Boolean division”, in Proceedings, IEEE ISCAS, May 2006. 6 CODCs  Observability Don’t Cares (ODCs) ODC jk  {x  B n s.t.    z k ( x) | y j 0  z k ( x) | y j 1} ODCs used to minimize the logic function of a node Need to re-compute the ODCs of the other nodes after optimization Compatible Observability Don’t Cares     Can simultaneously change the function of all nodes Subset of ODCs full_simplify is used to compute CODCs in SIS ROBDD based computation to compute CODCs 7 CODCs (contd.)    Memory intensive Computation is possible only for small and medium sized circuits Approximate CODCs by Saluja et. al     30X faster than full CODCs Requires 30X less memory than full CODCs Literal count reduction is about 80% of that obtained by full CODCs Can compute ACODCs for arbitrarily large circuits 8 PTL with Generalized Buffering Primary Output Primary Inputs Primary Output Primary Inputs 9 Boolean Division  Definition 1: g is a Boolean divisor of f if h and r exist such that f = gh + r where, gh ≠ Ø Definition 2: g is a Boolean factor of f if, g is a Boolean divisor of f, and in addition, r = Ø , i.e. f = gh  Theorem: If fg ≠ Ø, then g is a Boolean divisor of f.  10 ROBDD Division     Consider a (partitioned) ROBDD f of a node n in the network Let d represent the CODCs of node n Consider a library gate g ≡ G Division of f by g can be represented by following equations: f  fg  f g f  G ( f  d  x)  f g where x  g U  G ( f  d  g )  f g // Upper bound of f L  fG  f g //Lower bound of f Therefore, quotient f  h  f  d  g remainder r  f g Finally, f  Gh  r 11 Generalized Buffering with CODCs  Synthesize partitioned PTL blocks with    Maximum depth of 5 No more than 5 transistors in series Optimize and decompose network     Using only 2-input gates and inverters PTL structure will grow in predictable manner Initially any ROBBD can have maximum 8 variables If division fails, we can make one of the fanins a ROBDD variable -- back-track 12 Algorithm: Generalized Buffering with CODCs A = dfs_and_levelize_nodes(η*) i =1 while i ≤ size(A) do n = array_fetch(A,i) f = ntbdd_node_to_bdd(n) if bdd_depth(f) ≥ 5 then for g ≡ G Gate Library do d = compute_dc(n) f = test_division(f,g,d,G) end for if bdd_depth(f) > 5 then back-track else bdd_create_variable(n) continue end if else continue end if end while //creates ROBDD of node n 13 test_division with CODCs test_division(f,d,g,G) { if fg ≠ 0 then U  ( fG  dG  f g  d g )( g G ) L  ( fG  f g )( g G ) Z = bdd_between(L,U) Z* = bdd_smooth(Z,gvars) R = bdd_compose(Z*,G,g) if f  R  f + d && bdd_depth(Z*) < bdd_depth(f) then Q  d ( g G ) d  V Q return(success, Z*) end if else return fail end if 14 } back-track n a n-1 n-2 d Needs back-track c n b Re-process b Make ‘c’ a variable n-2 15 Experimental Setup    Implemented in SIS Process Technology- 100nm BPTM Gate Library    AND2, AND3, AND4 OR2, OR3, OR4 A set of benchmark circuits were synthesized  Compared with traditional method    Inverters for buffering Similar to method reported by Buch. et. al Also compared with generalized buffering without don’t cares 16 Gate Delay and Area Gate MUX INV Buffer AND2 AND3 AND4 OR2 OR3 OR4 Delay (ps) 18 10.26 20.5 30.20 37.76 47.39 38.70 46.08 68.28 Area (2) 0.08 0.08 0.16 0.28 0.44 0.64 0.36 0.68 1.12 17 Delay Ckt Traditional Buffering (ps) Without CODCs With ACODCs With CODCs alu2 1927.26 0.53 0.55 0.54 alu4 3458.88 0.45 0.43 - apex6 819.72 0.73 0.75 0.75 C432 2302.38 0.86 0.79 0.76 C880 1248.8 0.70 0.53 0.54 C1908 1336.14 0.74 0.67 0.68 C3540 2104.56 0.71 0.53 - i8 940.68 0.69 0.62 0.60 C6288 5971.5 0.94 0.86 - … … … … … 0.756 0.710 - AVG Generalized Buffering (ps) 18 Area Ckt Generalized Buffering (2) Traditional Buffering (2) Without CODCs With ACODCs With CODCs alu2 164.32 0.87 0.86 0.83 alu4 963.68 1.12 1.12 - apex6 305.52 0.95 0.93 0.93 C432 87.12 1.02 0.93 1.10 C880 136.16 0.80 0.79 0.79 C1908 152.24 0.97 0.93 0.93 C3540 532.88 1.01 0.98 - i8 520.48 0.75 0.72 0.70 C6288 1220.48 1.10 1.06 - … … … … … 0.970 0.950 - AVG 19 Multiplexers Ckt Traditional Buffering Without CODCs With ACODCs With CODCs alu2 718 0.60 0.577 0.532 alu4 4188 0.704 0.689 - apex6 1337 0.737 0.719 0.718 C432 404 0.995 0.795 0.834 C880 594 0.731 0.702 0.712 C1908 660 0.858 0.806 0.792 C3540 2362 0.732 0672 - i8 2418 0.483 0.449 0.439 C6288 5393 0.917 0.834 - … … … … … 0.770 0.729 - AVG Generalized Buffering 20 Run-time Ckt Generalized Buffering (s) Without CODCs With ACODCs With CODCs alu2 0.38 11.48 435.32 alu4 8.99 54.52 - apex6 1.49 4.6 117.23 C432 0.46 13.82 236.82 C880 0.24 2.41 70.0 C1908 0.790 8.08 362.12 C3540 5.01 41.56 - i8 1.63 28.73 2651.48 C6288 10.79 151.49 - … … … … AVG 4.837 33.55 21 Conclusions  Generalized buffering with ACODCs results in delay reduction by    Area reduction obtained by generalized buffering with ACODCs is    29% over traditional buffering 5% over generalized buffering without don’t cares 5% compared to traditional buffering 2% compared to generalized buffering without don’t cares Multiplexers also reduced by   27% compared to traditional buffering 4% compared to generalized buffering without don’t cares 22 Conclusions (contd)   A large number of divisions were obtained for each circuit Little advantage of using CODCs over ACODCs     Delay reduction is less than 1% Area increases by 1% Run-time is 76X slower Can synthesize arbitrary sized circuits using partitioned ROBDDs and ACODC based division 23 Thank You!! Questions? 24

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Presentation slides. - Dept. of Electrical and Computer Engineering