Download S. Hassoun and D. Marculescu, “Towards GALS Design

Soha Hassoun Research Update September, 2003 Primary Research Interests Hassoun's primary research interest is in the area of Computer-Aided Design (CAD) for Integrated Circuits (ICs). CAD is the discipline of creating efficient algorithms and programming them to automate many tasks needed to build multi-million transistor ICs. The IC designer then utilizes the CAD algorithms to design, analyze, and verify different aspects of an IC design. The Details Hassoun’s interests focus mainly on: Synthesis (subsections R1-R7 below), Timing Analysis (R8-R9), and Physical Design (R10). Of special interest is the synergy between these areas, such as optimal pipelining when constructing physical routes, and timing-driven synthesis. Other interests include emulation (R11), optimizations for low power (R12), and design at the circuit and architecture levels (R13). Future research plans are listed in (R14). R1. Synthesis: Hardware Threading and External Profiling. We address the optimization of application-specific hardware for multimedia and communication applications where an incoming packet or message is processed independently of other packets. Two new techniques are introduced: external profiling and hardware threading. The first technique allows the optimal partitioning of a hardware design based on an external probabilistic load. The second technique allows on-the-fly re-configuration. Unutilized hardware is borrowed to maximize task-level parallelism, which results in boosting performance or lowering power consumption. The novelty of this research is two fold. First, it is one of the early attempts to synthesize applicationspecific configurable components. Second, it is the first work in high-level synthesis that utilizes external profiling to achieve optimal synthesis results. Results show up to 12% performance improvements and up to 44% power dissipation reduction. A provisional patent has been filed. Related Publications: 1. B. Swahn and S. Hassoun, "Dynamic Adaptability via Hardware Threading", accepted for publication, ACM Transactions on Embedded Computing. 2. B. Swahn and S. Hassoun, "Hardware Scheduling for Dynamic Adaptability Using External Profiling and Hardware Threading", International Conference on Computer-Aided Design (ICCAD), November, 2003. 3. S. Hassoun and Swahn, B., "Circuit Having Hardware Threading", provisional filed in 2003. Soha Hassoun 1/8 R2. Synthesis: Architectural Retiming. Given a synchronous circuit whose clock period is limited by a latency-constrained path, architectural retiming increases the number of registers on that path, thereby decreasing the clock period, while preserving the circuit's functionality and latency. The solution presented is the addition of both a pipeline register, and a negative register to counteract the latency. The negative register can be implemented as a precomputation or a prediction. The importance of Architectural retiming lies in its formalization of many ad hoc optimization techniques, such as lookahead, bypassing, and branch prediction, to solve latency problems. References #1 and #2 offer an overview, while references #3, #4, #5 present more details. The concept of architectural retiming was shown to be applicable in high-level synthesis in reference #5 by showing architectural retiming to be comparable to a fine-grain rescheduling technique. Related Publications: 1. 2. 3. 4. 5. 6. S. Hassoun and C. Ebeling, "Architectural Retiming: Pipelining Latency-Constrained Circuits", Design Automation Conference (DAC), June, 1996 Hassoun and C. Ebeling, "Architectural Retiming: An Overview", International Workshop on Timing Issues in the Specification and Synthesis of Digital Systems (TAU), Seattle, WA, Nov. 1995. Hassoun, and C. Ebeling, "Using Precomputation in Architecture and Logic Synthesis", International Conference on Computer-Aided Design (ICCAD), pp. 316—423, November, 1998. S. Hassoun and C. Ebeling, "An Overview of Prediction-Based Architectural Retiming", International Workshop on Timing Issues in the Specification and Synthesis of Digital Systems (TAU), Dec, 1997. S. Hassoun and C. Ebeling, "Sequential Circuit Optimization Using Precomputation", International Workshop on Logic Synthesis, May, 1997. S. Hassoun, "Fine Grain Incremental Rescheduling via Architectural Retiming", International Symposium on System Synthesis (ISSS), pp. 158-163, December, 1998. R3. Synthesis: Regularity Extraction. Utilizing regularities and symmetries during circuit optimization and layout can speed up CAD tools and produce more compact circuits. We introduced a new approach to extracting circuit regularities. The approach is comprised of three steps. First, the circuit graph is decomposed in a hierarchical inclusion parse tree using a clan-based decomposition algorithm. This algorithm discovers clans. groupings of nodes in the circuit graph that have a natural affinity towards each other. Second, the parse tree nodes are classified into equivalence classes. Such classes represent templates suitable for circuit covering. The final step consists of using a binate cover solver to find an appropriate cover. The cover will consist of instantiated templates and gates that cannot be covered by any templates. The complexity of the algorithm is O(n 4), which was an improvement over earlier algorithms with complexity O(n5) while obtaining comparable results. Related Publications Soha Hassoun 2/8 S. Hassoun and C. McCreary, "Clan-Based Structural Circuit Decomposition", International Workshop on Logic Synthesis (IWLS), Granlibakken, June, 1999. S. Hassoun, and C. McCreary, "Regularity Extraction Via Clan-Based Structural Circuit Decomposition", International Conference on Computer-Aided Design (ICCAD), pp. 414—419, November, 1999. R4. Synthesis: Sequential Optimizations. The iterative application of retiming and combinational resynthesis exposes many optimization opportunities not available by independently considering each technique. Using 30 alternating steps of resynthesis and retiming, we show a performance gain equal to twice that achieved by a single resynthesis/retiming step for a subset of the standard MCNC benchmarks. This work is important because it demonstrates the optimization potential in combining sequential and combinational optimizations. The book chapter in Logic Synthesis and Verification provides a survey of related sequential optimization techniques including retiming. Related Publications S. Hassoun and C. Ebeling, "Experiments in the Iterative Application of Resynthesis and Retiming", International Workshop on Timing Issues in the Specification and Synthesis of Digital Systems (TAU), Dec., 1997. S. Hassoun and T. Villa, "Optimization of Synchronous Circuits", a chapter in "Logic Synthesis and Verification", editors: Hassoun and Sasao, Kluwer Academic Publishers, pp. 225—253, 2002. R5. Synthesis: Functional Customization. Users of soft IP are faced with the problem of ``one IP fits all''. The resulting IPs contain unused functionality, resulting in wasted area and undesirable power dissipation. We advocate a new research direction in synthesis we label as functional customization. With this synthesis methodology, functionality is customized for a given IP thus producing an end product that more closely matches the IP user's functional requirements. A detailed case study of the UART 16550 provides insight into functional customization. J. Xie and S. Hassoun, "A Case Study in IP Functional Customization: The UART 16550", International Workshop on Logic Synthesis (IWLS), Granlibakken, June, 2001. R6. Synthesis: Finite State Assignment. Reference #29 explores improving finite state assignments for two-level Programmable Logic Devices. Two approaches are investigated: iterative improvement to reduce the number of product Soha Hassoun 3/8 terms per each state and output bit, and exploiting output functions as pre-encoded state bits. This was one of the early works to tailor state assignments for configurable components. S. Hassoun and G. Borriello, "Improving Finite State Assignment for Two-Level Programmable Logic Devices", International Workshop on Logic Synthesis, May, 1993. R7. Synthesis: Logic Synthesis and Verification. The book, Logic Synthesis and Verification (Kluwer Academic Publishers), co-edited by Hassoun and Sasao with R. Brayton as consulting editor, contains contributions by twenty-seven experts in the area of Logic Synthesis and Verification. Hassoun initiated this book effort, and she planned the book chapters with Brayton. She was responsible for the editing of eight of the chapters. The book sold over 500 copies, and is used by many universities as a recommended reference for CAD courses on Logic and synthesis. Example courses include:  EECS219B, Logic Synthesis and Verification, taught by Andreas Kuehlmann at the University of California at Berkeley (main reference)  ECE 697b, Synthesis and Verification of Digital Systems, taught by Maciej Ciesielski at the University of Massachusetts at Amherst  ECE 571, Advanced Synthesis and Verification Algorithms, taught by Sarma Vrudhula at the University of Arizona  ENEL619, Advanced Logic Design, taught by S. N. Yanushkevich at the University of Calgary  CS220, Synthesis of Digital Systems, taught by Harry Hsieh at the University of California, Riverside R8. Timing Analysis: Cross Talk Analysis. Shrinking feature sizes of integrated circuits to nanometer scales have forced designers to consider the impact of capacitive coupling on delays. We introduce a novel abstract gate delay model, the dynamically bounded delay model. Unlike the bounded delay model, which statically assigns a fixed delay range for each circuit component prior to applying timing analysis, delay ranges vary in our model. Ranges are dynamically resolved during timing analysis. This model is a generalization of the 0X, 1X, and 2X capacitive models. Timing analysis with cross talk is formulated as a MILP problem. We showed how to perform timing analysis in the presence of cross talk on circuits containing level-sensitive circuits, which are often used in high-performance design. Related Publications 1. S. Hassoun, "Critical Path Analysis Using a Dynamically Bounded Delay Model", Design Automation Conference (DAC), 260—265, June, 2000. 2. S. Hassoun, C., Cromer, and E. Calvillo-Gámez, "Static Timing Analysis for Level-Clocked Circuits in the Presence of Cross Talk", IEEE Transactions on Computer-Aided Design, September, 2003. Soha Hassoun 4/8 3. S. Hassoun, C. Cromer, and E. Calvillo-Gámez, "Verifying Clock Schedules in the Presence of Cross Talk", Design Automation and Test in Europe (DATE) Conference, pp. 346—350, March, 2002. R9. Timing Analysis: GALS Architectures. The re-use of IP and the need to alleviate clocking problems calls for using a Globally Asynchronous Locally Synchronous Designs (GALS) synchronization methodology. We explore how to optimize the routing between two GALS modules by optimally placing buffers (repeaters) and a distributed FIFO between the modules. We describe several challenges that must be overcome to enable future GALS design methodologies. Related Publications S. Hassoun, c. Alpert, and M. Thiagarajan, "Optimal Buffered Routing Path Constructions for Single and Multiple Clock Domain Systems", International Conference on Computer-Aided Design (ICCAD), pp. 247—252, November, 2002. S. Hassoun and C. Alpert, "Optimal Path Routing in Single and Multiple Clock Domain Systems", IEEE Transaction on Computer-Aided Design, November, 2003. S. Hassoun and D. Marculescu, “Towards GALS Design Methodologies”, workshop on Formal Methods For Globally Asynchronous Locally Synchronous (GALS) Architecture, September, 2003. R10. Physical Design: Optimal Pipelined Route Construction. With increasingly smaller feature sizes and faster clock frequencies, multiple clock cycles are now needed to route global signals on a chip. This area has been mostly neglected until recently because it requires exploring the interaction between timing and physical design. The research thrust here is looking at the interaction between timing and routing, bringing a fresh perspective to the physical design area. We explore how to adapt maze routing to optimally construct a buffered and pipelined route. We extend this work for circuits containing level-sensitive latches. A patent is pending on the software algorithms that were used in this work. Related Publications 1. S. Hassoun, c. Alpert, and M. Thiagarajan, "Optimal Buffered Routing Path Constructions for Single and Multiple Clock Domain Systems", International Conference on Computer-Aided Design (ICCAD), pp. 247—252, November, 2002. 2. S. Hassoun and C. Alpert, "Optimal Path Routing in Single and Multiple Clock Domain Systems", IEEE Transaction on Computer-Aided Design, November, 2003. 3. S. Hassoun, "Optimal Use Of 2-Phase Transparent Latches In Buffered Maze Routing", IEEE International Symposium on Circuits and Systems (ISACS), May, 2003. 4. Alpert, C. and S. Hassoun, "Optimal Buffered Routing Path Constructions for Single and Multiple Clock Domain Systems", filed in 2002 Soha Hassoun 5/8 R11. Other: Emulation. The functional verification on-chip designs containing millions of transistors is a challenge. Simulation run times are increasing, and emulation is now a necessity. We introduce a unified simulation/emulation architecture that allows easy migration of test code from simulation to emulation. Synchronization and communication latencies between the software test and the emulated hardware are minimized by using transaction-based interactions, thus enabling two orders of magnitude speeds up over traditional simulations. We detail the emulation performance model and compare the estimated vs. measured performance. Transactions are now common in both simulation and emulation. A patent was filed for the low level communication between the emulator and the workstation. Related Publications 1. S. Hassoun, M. Kudlugi, C. Selvidge, and D. Pryor, "A Transaction-Based Unified Architecture for Simulation and Emulation", accepted for publication, IEEE Transactions on VLSI. 2. M. Kudlugi, S. Hassoun, C. Selvidge, and D. Pryor, "A Transaction-Based Unified Simulation/Emulation Architecture for Functional Verification", Design Automation Conference (DAC), pp. 623—8, June, 2001. 3. C. Selvidge, K. Crouch, M. Kudlugi, and S. Hassoun, "Non-Synchronized Multiplex Data Transport Across Synchronous systems", filed March 30, 2000. R12. Other: Low Power. Leakage current is slated to be a major contributor to power dissipation in future IC technologies. We explore using an incremental satisfiability engine to find the maximum and minimum leakage current and the culprit input vector causing such leakage. Results are improvements over the use of random vectors to generate such bounds. F. Aloul, S. Hassoun, K. Sakallah, and D. Blaauw, "Robust SAT-Based Search Algorithm for Leakage Power Reduction", International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS), 2002. R13. Other: Circuit Design and Computer Architecture. Hassoun's earlier work focused on circuit design and computer architecture. Her MS Thesis describes a 3-transistor dynamic memory that had architectural features to enable data prefetching. She worked on the architecture of a message-driven processor, the J-Machine. This Reference was a landmark paper in computer architecture. A follow up article on the work appeared in "Retrospective in 25 Years of the International Symposia on Computer Architecture - Selected Papers". Other published work covers details of circuit design and architectures for landmark architectures, including the Alpha 21064 processor, known for both its architecture and circuitry, for which Hassoun was a lead circuit designer while employed at Digital Equipment Corp. Soha Hassoun 6/8 Related Work 1. K. Bolding, S.-C. Cheung, S.-E. Choi, C. Ebeling, S. Hassoun, T. Ngo, and R. Wille, "The Chaos Router Chip: Design and Implementation of an Adaptive Router", International Conference on Very Large Scale Integration, Grenoble, France, September, 1993. 2. S. Hassoun and D. Sanders, "Method and Apparatus for Parity Generation", U.S. Patent 5557622, Sept. 17, 1996. 3. D. Dobberpuhl, R. Witek, R. Allmon, R. Anglin, D. Bertucci, S. Britton, L. Chao, R. Conrad, D. Dever, B. Gieseke, S. Hassoun, G. Hoeppner, K. Kuchler, M. Ladd, B. Leary, L. Madden, E. McLellan, D. Meyer, J. Montanaro, D. Priore, V. Rajagopalan, S. Samudrala, S. Santhanam. "A 200 MHz 64-b Dual-Issue CMOS Microprocessor", IEEE Journal of Solid-State Circuits, November, 1992, Vol. 27, No. 11. Also appears in Digital Technical Journal. Vol. 4, No. 4, 1992. 4. R. Allmon, B. Benschneider, M. Callander, L. Chao, D. Dever, J. Farrell, N. Fitzgerald, J. Grodstein, S. Hassoun, L. Hudepohl, D. Kravitz, J. Lundberg, R. Marcello, S. Marino, J. Pickholtz, R. Preston, M. Richesson, S. Samudrala, and D. Sanders. "System, Process, and Design Implications of a Reduced Supply Voltage Microprocessor", IEEE International Solid-State Circuits Conference (ISSCC), February, 1990. 5. W. Dally, L. Chao, A. Chien, S. Hassoun, W. Horwat, J. Kaplan, P. Song, B. Totty, and S. Wills, "Architecture of a Message-Driven Processor", International Symposium on Computer Architecture (ISCA), June, 1987. 6. S. Hassoun, "A Memory Design for the Message-Driven Processor", MIT VLSI Memo No. 88-564, August, 1988. Soha Hassoun 7/8 R14. Future Research Plans. Hassoun plans to continue research in the areas of synthesis, timing analysis, physical design, and to investigate CAD tools for quantum computing. In synthesis, Hassoun plans to continue to work on the synthesis of configurable systems. Efforts have just commenced to understand and support how to use FPGAs as a configurable shared resource among multiple system components. The FPGA will provide security measures and caching mechanism to speed up run-time re-configurations. Hassoun also plans to investigate efficient sequential optimization techniques that can produce results similar to those obtained by iterative retiming and resynthesis. In timing analysis, Hassoun is currently preparing a survey of over 100 papers on cross talk analysis and preventive techniques. The survey provides insight into several new directions to further solve this problem. In physical design, significant progress has already been made to understand the need for routing delay models that consider inductance. Upcoming work will examine limitations of earlier delay models and argues the need to consider inductance for all buffered routing construction. Planned research includes using wave pipelining, a technique to optimize sequential circuits, in optimal route construction. Hassoun’s long-term research focus will be CAD tools for quantum computing. Hassoun has already established a quantum computing reading group at Tufts with Leon Gunther (Physics, Tufts), Bruce Boghosian (Math, Tufts), and Mary Beth Rusaki (Math, Tufts), thus tackling this problem from multi-disciplinary angles. Soha Hassoun 8/8

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download S. Hassoun and D. Marculescu, “Towards GALS Design