Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Area and Speed Oriented Implementations of Asynchronous Logic Operating Under Strong Constraints Igor Lemberski Baltic International Academy Riga, Latvia e-mail: [email protected] Petr Fišer Czech Technical University in Prague Faculty of Information Technology e-mail: [email protected] Outline Asynchronous circuits model used Motivation & proposed method Experimental results Conclusions EUROMICRO DSD 2010, Lille 2 Asynchronous Circuits Model Used Unbounded delay model Gate and wire delays are not limited The circuit is able to recognize the moment when input states have changed Dual-rail encoding Positive and negative values of each signal are provided = 1, f(1) = 0 – log. 1 f(0) = 0, f(1) = 1 – log. 0 f(0) = 0, f(1) = 0 – space state (spacer) f(0) = 1, f(1) = 1 – not allowed f(0) EUROMICRO DSD 2010, Lille 3 Four-Phase Discipline Inputs in space state (00) Outputs in working state (10, 01) Outputs in space state (00) Inputs in working state (10, 01) EUROMICRO DSD 2010, Lille 4 Seitz’s Constraints Strong constraints Each output changes its state only when all inputs have changed their state In contrast to weak constraints Some outputs are permitted to change their state when some inputs have changed their state EUROMICRO DSD 2010, Lille 5 Seitz’s Constraints Strong constraints Each output changes its state only when all inputs have changed their state In contrast to weak constraints Some outputs are permitted to change their state when some inputs have changed their state EUROMICRO DSD 2010, Lille 6 Seitz’s Strong Constraints Pros Regularity Extra completion detection logic not needed Circuit delay is based on actual gate delays No additional synchronization chains Cons Rather high area and delay DIMS (Delay-Insensitive Minterm Synthesis) NCL (Null Convention Logic) Direct Logic EUROMICRO DSD 2010, Lille 7 DIMS (Delay-Insensitive Minterm Synthesis) 2-level implementation 2n n-input C-elements + n-input OR Function implemented as sum-of-minterms EUROMICRO DSD 2010, Lille 8 NCL (Null Convention Logic) Library of 27 special gates Based on threshold functions Any function up to 4 inputs can be implemented … but in dual-rail, 4 inputs = 2 variables only EUROMICRO DSD 2010, Lille 9 Direct Logic Two-level C-OR DIMS logic implemented as a single gate Both positive and complemented outputs are provided Different delays for each input EUROMICRO DSD 2010, Lille 10 Comparison DIMS Inputs Trans. Direct logic Delay Trans. Delay 2 24 8.2 22 8.2 3 64 15.1 34 12.3 4 160 21.3 54 19.4 5 384 N/A 90 N/A 6 896 N/A 158 N/A NCL 2-input gate Trans. Delay AND, OR 21 5.8 XOR 24 8.6 EUROMICRO DSD 2010, Lille 11 Multi-Level Dual-Rail Network Positive and complemented values of each signal provided Each node implemented as DIMS, NCL, or Direct logic EUROMICRO DSD 2010, Lille 12 Motivation & Proposed Method State-of-the-art Nodes are implemented as simple gates (NAND, XOR) 4x 2-input gate = 22*4 = 88 transistors in Direct logic EUROMICRO DSD 2010, Lille 13 Motivation & Proposed Method Proposed Nodes are implemented as complex gates 1x 2-input gate + 1x 3-input gate = 22 + 34 = 56 transistors EUROMICRO DSD 2010, Lille 14 Motivation & Proposed Method State-of-the-art Nodes are implemented as simple gates (NAND, XOR) Proposed Nodes are implemented as complex gates, i.e. gates of a given number of inputs and any function Can be implemented both in DIMS and Direct logic Like FPGA LUTs Tools for synchronous synthesis can be used FPGA mapping EUROMICRO DSD 2010, Lille 15 Where’s the Problem? Facts: Increase of the number of node inputs will: Decrease the number of nodes Decrease the number of levels Increase the node size Increase the node delay Question: Where is the trade-off? EUROMICRO DSD 2010, Lille 16 Experimental Setup 228 circuits processed (MCNC, ISCAS) Optimized by ABC choice script 1. Mapped into k-input NANDs (ABC map command ) state-of-the-art (k-NAND) 2. Mapped into k-LUTs (ABC fpga command) complex gates (k-CG) 3. Mapped into MCNC standard cells (ABC map) something in-between (SC) k = 2…6 Implemented as DIMS, Direct logic, and NCL EUROMICRO DSD 2010, Lille 17 Results – DIMS - Area 2-NAND 3-NAND 4-NAND 5-NAND 6-NAND 2-CG 3-CG 4-CG 5-CG 6-CG SC 0,0 2,0M 4,0M 6,0M 8,0M 10,0M 12,0M 14,0M 16,0M Transistors EUROMICRO DSD 2010, Lille 18 Results – DIMS - Area 2-NAND 3-NAND 4-NAND 5-NAND 6-NAND 2-CG 3-CG 4-CG 5-CG 6-CG SC 0% 10% 20% 30% 40% 50% 60% 70% Best in EUROMICRO DSD 2010, Lille 19 Results – DIMS – Delay 2-NAND 3-NAND 4-NAND 2-CG 3-CG 4-CG SC 0,0 5,0k 10,0k 15,0k 20,0k 25,0k Delay EUROMICRO DSD 2010, Lille 20 Results – DIMS – Delay 2-NAND 3-NAND 4-NAND 2-CG 3-CG 4-CG SC 0% 10% 20% 30% 40% 50% 60% Best in EUROMICRO DSD 2010, Lille 21 Discussion - DIMS Implementation using arbitrary 2-input gates is the best one, both in area and delay No big surprise. Complexity (and delay) of DIMS grows exponentially with the number of gate inputs Results are consistent – the more node inputs, the higher area and delay EUROMICRO DSD 2010, Lille 22 Results – Direct Logic - Area 2-NAND 3-NAND 4-NAND 5-NAND 6-NAND 2-CG 3-CG 4-CG 5-CG 6-CG NCL SC 0,0 500,0k 1,0M 1,5M 2,0M 2,5M 3,0M Transistors EUROMICRO DSD 2010, Lille 23 Results - Direct Logic - Area 2-NAND 3-NAND 4-NAND 5-NAND 6-NAND 2-CG 3-CG 4-CG 5-CG 6-CG NCL SC 0% 10% 20% 30% 40% 50% Best in EUROMICRO DSD 2010, Lille 24 Results – Direct Logic - Delay 2-NAND 3-NAND 4-NAND 2-CG 3-CG 4-CG NCL SC 0,0 5,0k 10,0k 15,0k 20,0k Delay EUROMICRO DSD 2010, Lille 25 Results – Direct Logic - Delay 2-NAND 3-NAND 4-NAND 2-CG 3-CG 4-CG NCL SC 0% 10% 20% 30% 40% 50% 60% 70% Best in EUROMICRO DSD 2010, Lille 26 Discussion - Direct Logic Implementation using 3-input complex gates is the best one, both in area and delay This is a good result confirming our theory Results are consistent - no coincidence State-of-the-art 2-NAND implementation is extremely inefficient: 21% area improvement 19% delay improvement 3-CG implementation is even better than NCL 10% area improvement 19% delay improvement EUROMICRO DSD 2010, Lille 27 Conclusions Efficient implementation of asynchronous logic operating under strong constraints proposed Tools (& methods) for synchronous synthesis are used for asynchronous synthesis 3-input complex nodes implemented using Direct logic Extensive experiments confirmed the theory cca. 20% area and delay improvement vs. all state-of-the-art methods EUROMICRO DSD 2010, Lille 28