Download S. Hassoun and D. Marculescu, “Towards GALS Design

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Fault tolerance wikipedia , lookup

Flexible electronics wikipedia , lookup

Music technology (electronic and digital) wikipedia , lookup

Electronic musical instrument wikipedia , lookup

Electronic engineering wikipedia , lookup

Integrated circuit wikipedia , lookup

Emulator wikipedia , lookup

Digital electronics wikipedia , lookup

Hardware description language wikipedia , lookup

Transcript
Soha Hassoun
Research Update
September, 2003
Primary Research Interests
Hassoun's primary research interest is in the area of Computer-Aided Design (CAD) for Integrated
Circuits (ICs). CAD is the discipline of creating efficient algorithms and programming them to
automate many tasks needed to build multi-million transistor ICs. The IC designer then utilizes the
CAD algorithms to design, analyze, and verify different aspects of an IC design.
The Details
Hassoun’s interests focus mainly on: Synthesis (subsections R1-R7 below), Timing Analysis (R8-R9),
and Physical Design (R10). Of special interest is the synergy between these areas, such as optimal
pipelining when constructing physical routes, and timing-driven synthesis. Other interests include
emulation (R11), optimizations for low power (R12), and design at the circuit and architecture levels
(R13). Future research plans are listed in (R14).
R1. Synthesis: Hardware Threading and External Profiling.
We address the optimization of application-specific hardware for multimedia and communication
applications where an incoming packet or message is processed independently of other packets.
Two new techniques are introduced: external profiling and hardware threading. The first technique
allows the optimal partitioning of a hardware design based on an external probabilistic load. The
second technique allows on-the-fly re-configuration. Unutilized hardware is borrowed to maximize
task-level parallelism, which results in boosting performance or lowering power consumption. The
novelty of this research is two fold. First, it is one of the early attempts to synthesize applicationspecific configurable components. Second, it is the first work in high-level synthesis that utilizes
external profiling to achieve optimal synthesis results. Results show up to 12% performance
improvements and up to 44% power dissipation reduction. A provisional patent has been filed.
Related Publications:
1.
B. Swahn and S. Hassoun, "Dynamic Adaptability via Hardware Threading", accepted for
publication, ACM Transactions on Embedded Computing.
2.
B. Swahn and S. Hassoun, "Hardware Scheduling for Dynamic Adaptability Using External
Profiling and Hardware Threading", International Conference on Computer-Aided Design
(ICCAD), November, 2003.
3.
S. Hassoun and Swahn, B., "Circuit Having Hardware Threading", provisional filed in 2003.
Soha Hassoun
1/8
R2. Synthesis: Architectural Retiming.
Given a synchronous circuit whose clock period is limited by a latency-constrained path,
architectural retiming increases the number of registers on that path, thereby decreasing the
clock period, while preserving the circuit's functionality and latency. The solution presented is the
addition of both a pipeline register, and a negative register to counteract the latency. The negative
register can be implemented as a precomputation or a prediction. The importance of Architectural
retiming lies in its formalization of many ad hoc optimization techniques, such as lookahead,
bypassing, and branch prediction, to solve latency problems. References #1 and #2 offer an
overview, while references #3, #4, #5 present more details. The concept of architectural
retiming was shown to be applicable in high-level synthesis in reference #5 by showing
architectural retiming to be comparable to a fine-grain rescheduling technique.
Related Publications:
1.
2.
3.
4.
5.
6.
S. Hassoun and C. Ebeling, "Architectural Retiming: Pipelining Latency-Constrained Circuits",
Design Automation Conference (DAC), June, 1996
Hassoun and C. Ebeling, "Architectural Retiming: An Overview", International Workshop on
Timing Issues in the Specification and Synthesis of Digital Systems (TAU), Seattle, WA,
Nov. 1995.
Hassoun, and C. Ebeling, "Using Precomputation in Architecture and Logic Synthesis",
International Conference on Computer-Aided Design (ICCAD), pp. 316—423, November,
1998.
S. Hassoun and C. Ebeling, "An Overview of Prediction-Based Architectural Retiming",
International Workshop on Timing Issues in the Specification and Synthesis of Digital
Systems (TAU), Dec, 1997.
S. Hassoun and C. Ebeling, "Sequential Circuit Optimization Using Precomputation",
International Workshop on Logic Synthesis, May, 1997.
S. Hassoun, "Fine Grain Incremental Rescheduling via Architectural Retiming", International
Symposium on System Synthesis (ISSS), pp. 158-163, December, 1998.
R3. Synthesis: Regularity Extraction.
Utilizing regularities and symmetries during circuit optimization and layout can speed up CAD tools
and produce more compact circuits. We introduced a new approach to extracting circuit
regularities. The approach is comprised of three steps. First, the circuit graph is decomposed in a
hierarchical inclusion parse tree using a clan-based decomposition algorithm. This algorithm
discovers clans. groupings of nodes in the circuit graph that have a natural affinity towards each
other. Second, the parse tree nodes are classified into equivalence classes. Such classes represent
templates suitable for circuit covering. The final step consists of using a binate cover solver to
find an appropriate cover. The cover will consist of instantiated templates and gates that cannot
be covered by any templates. The complexity of the algorithm is O(n 4), which was an improvement
over earlier algorithms with complexity O(n5) while obtaining comparable results.
Related Publications
Soha Hassoun
2/8
S. Hassoun and C. McCreary, "Clan-Based Structural Circuit Decomposition", International
Workshop on Logic Synthesis (IWLS), Granlibakken, June, 1999.
S. Hassoun, and C. McCreary, "Regularity Extraction Via Clan-Based Structural Circuit
Decomposition", International Conference on Computer-Aided Design (ICCAD), pp. 414—419,
November, 1999.
R4. Synthesis: Sequential Optimizations. The iterative application of retiming and
combinational resynthesis exposes many optimization opportunities not available by independently
considering each technique. Using 30 alternating steps of resynthesis and retiming, we show a
performance gain equal to twice that achieved by a single resynthesis/retiming step for a subset of
the standard MCNC benchmarks. This work is important because it demonstrates the optimization
potential in combining sequential and combinational optimizations. The book chapter in Logic
Synthesis and Verification provides a survey of related sequential optimization techniques
including retiming.
Related Publications
S. Hassoun and C. Ebeling, "Experiments in the Iterative Application of Resynthesis and
Retiming", International Workshop on Timing Issues in the Specification and Synthesis of
Digital Systems (TAU), Dec., 1997.
S. Hassoun and T. Villa, "Optimization of Synchronous Circuits", a chapter in "Logic
Synthesis and Verification", editors: Hassoun and Sasao, Kluwer Academic Publishers, pp.
225—253, 2002.
R5. Synthesis: Functional Customization.
Users of soft IP are faced with the problem of ``one IP fits all''. The resulting IPs contain unused
functionality, resulting in wasted area and undesirable power dissipation. We advocate a new
research direction in synthesis we label as functional customization. With this synthesis
methodology, functionality is customized for a given IP thus producing an end product that more
closely matches the IP user's functional requirements. A detailed case study of the UART 16550
provides insight into functional customization.
J. Xie and S. Hassoun, "A Case Study in IP Functional Customization: The UART 16550",
International Workshop on Logic Synthesis (IWLS), Granlibakken, June, 2001.
R6. Synthesis: Finite State Assignment.
Reference #29 explores improving finite state assignments for two-level Programmable Logic
Devices. Two approaches are investigated: iterative improvement to reduce the number of product
Soha Hassoun
3/8
terms per each state and output bit, and exploiting output functions as pre-encoded state bits.
This was one of the early works to tailor state assignments for configurable components.
S. Hassoun and G. Borriello, "Improving Finite State Assignment for Two-Level
Programmable Logic Devices", International Workshop on Logic Synthesis, May, 1993.
R7. Synthesis: Logic Synthesis and Verification.
The book, Logic Synthesis and Verification (Kluwer Academic Publishers), co-edited by Hassoun
and Sasao with R. Brayton as consulting editor, contains contributions by twenty-seven experts in
the area of Logic Synthesis and Verification. Hassoun initiated this book effort, and she planned
the book chapters with Brayton. She was responsible for the editing of eight of the chapters. The
book sold over 500 copies, and is used by many universities as a recommended reference for CAD
courses on Logic and synthesis. Example courses include:
 EECS219B, Logic Synthesis and Verification, taught by Andreas Kuehlmann at the University
of California at Berkeley (main reference)
 ECE 697b, Synthesis and Verification of Digital Systems, taught by Maciej Ciesielski at the
University of Massachusetts at Amherst
 ECE 571, Advanced Synthesis and Verification Algorithms, taught by Sarma Vrudhula at the
University of Arizona
 ENEL619, Advanced Logic Design, taught by S. N. Yanushkevich at the University of Calgary
 CS220, Synthesis of Digital Systems, taught by Harry Hsieh at the University of California,
Riverside
R8. Timing Analysis: Cross Talk Analysis.
Shrinking feature sizes of integrated circuits to nanometer scales have forced designers to
consider the impact of capacitive coupling on delays. We introduce a novel abstract gate delay
model, the dynamically bounded delay model. Unlike the bounded delay model, which statically
assigns a fixed delay range for each circuit component prior to applying timing analysis, delay ranges
vary in our model. Ranges are dynamically resolved during timing analysis. This model is a
generalization of the 0X, 1X, and 2X capacitive models. Timing analysis with cross talk is
formulated as a MILP problem. We showed how to perform timing analysis in the presence of cross
talk on circuits containing level-sensitive circuits, which are often used in high-performance design.
Related Publications
1.
S. Hassoun, "Critical Path Analysis Using a Dynamically Bounded Delay Model", Design
Automation Conference (DAC), 260—265, June, 2000.
2.
S. Hassoun, C., Cromer, and E. Calvillo-Gámez, "Static Timing Analysis for Level-Clocked
Circuits in the Presence of Cross Talk", IEEE Transactions on Computer-Aided Design,
September, 2003.
Soha Hassoun
4/8
3.
S. Hassoun, C. Cromer, and E. Calvillo-Gámez, "Verifying Clock Schedules in the Presence of
Cross Talk", Design Automation and Test in Europe (DATE) Conference, pp. 346—350,
March, 2002.
R9. Timing Analysis: GALS Architectures.
The re-use of IP and the need to alleviate clocking problems calls for using a Globally Asynchronous
Locally Synchronous Designs (GALS) synchronization methodology. We explore how to optimize the
routing between two GALS modules by optimally placing buffers (repeaters) and a distributed FIFO
between the modules. We describe several challenges that must be overcome to enable future
GALS design methodologies.
Related Publications
S. Hassoun, c. Alpert, and M. Thiagarajan, "Optimal Buffered Routing Path Constructions for
Single and Multiple Clock Domain Systems", International Conference on Computer-Aided
Design (ICCAD), pp. 247—252, November, 2002.
S. Hassoun and C. Alpert, "Optimal Path Routing in Single and Multiple Clock Domain
Systems", IEEE Transaction on Computer-Aided Design, November, 2003.
S. Hassoun and D. Marculescu, “Towards GALS Design Methodologies”, workshop on Formal
Methods For Globally Asynchronous Locally Synchronous (GALS) Architecture, September,
2003.
R10. Physical Design: Optimal Pipelined Route Construction.
With increasingly smaller feature sizes and faster clock frequencies, multiple clock cycles are now
needed to route global signals on a chip. This area has been mostly neglected until recently because
it requires exploring the interaction between timing and physical design. The research thrust here
is looking at the interaction between timing and routing, bringing a fresh perspective to the physical
design area. We explore how to adapt maze routing to optimally construct a buffered and pipelined
route. We extend this work for circuits containing level-sensitive latches. A patent is pending on
the software algorithms that were used in this work.
Related Publications
1.
S. Hassoun, c. Alpert, and M. Thiagarajan, "Optimal Buffered Routing Path Constructions for
Single and Multiple Clock Domain Systems", International Conference on Computer-Aided
Design (ICCAD), pp. 247—252, November, 2002.
2.
S. Hassoun and C. Alpert, "Optimal Path Routing in Single and Multiple Clock Domain
Systems", IEEE Transaction on Computer-Aided Design, November, 2003.
3.
S. Hassoun, "Optimal Use Of 2-Phase Transparent Latches In Buffered Maze Routing",
IEEE International Symposium on Circuits and Systems (ISACS), May, 2003.
4.
Alpert, C. and S. Hassoun, "Optimal Buffered Routing Path Constructions for Single and
Multiple Clock Domain Systems", filed in 2002
Soha Hassoun
5/8
R11. Other: Emulation.
The functional verification on-chip designs containing millions of transistors is a challenge.
Simulation run times are increasing, and emulation is now a necessity. We introduce a unified
simulation/emulation architecture that allows easy migration of test code from simulation to
emulation. Synchronization and communication latencies between the software test and the
emulated hardware are minimized by using transaction-based interactions, thus enabling two orders
of magnitude speeds up over traditional simulations. We detail the emulation performance model
and compare the estimated vs. measured performance. Transactions are now common in both
simulation and emulation. A patent was filed for the low level communication between the emulator
and the workstation.
Related Publications
1.
S. Hassoun, M. Kudlugi, C. Selvidge, and D. Pryor, "A Transaction-Based Unified
Architecture for Simulation and Emulation", accepted for publication, IEEE Transactions on
VLSI.
2.
M. Kudlugi, S. Hassoun, C. Selvidge, and D. Pryor, "A Transaction-Based Unified
Simulation/Emulation Architecture for Functional Verification", Design Automation
Conference (DAC), pp. 623—8, June, 2001.
3.
C. Selvidge, K. Crouch, M. Kudlugi, and S. Hassoun, "Non-Synchronized Multiplex Data
Transport Across Synchronous systems", filed March 30, 2000.
R12. Other: Low Power.
Leakage current is slated to be a major contributor to power dissipation in future IC technologies.
We explore using an incremental satisfiability engine to find the maximum and minimum leakage
current and the culprit input vector causing such leakage. Results are improvements over the use
of random vectors to generate such bounds.
F. Aloul, S. Hassoun, K. Sakallah, and D. Blaauw, "Robust SAT-Based Search Algorithm for
Leakage Power Reduction", International Workshop on Power and Timing Modeling,
Optimization and Simulation (PATMOS), 2002.
R13. Other: Circuit Design and Computer Architecture.
Hassoun's earlier work focused on circuit design and computer architecture. Her MS Thesis
describes a 3-transistor dynamic memory that had architectural features to enable data prefetching. She worked on the architecture of a message-driven processor, the J-Machine. This
Reference was a landmark paper in computer architecture. A follow up article on the work appeared
in "Retrospective in 25 Years of the International Symposia on Computer Architecture - Selected
Papers". Other published work covers details of circuit design and architectures for landmark
architectures, including the Alpha 21064 processor, known for both its architecture and circuitry,
for which Hassoun was a lead circuit designer while employed at Digital Equipment Corp.
Soha Hassoun
6/8
Related Work
1.
K. Bolding, S.-C. Cheung, S.-E. Choi, C. Ebeling, S. Hassoun, T. Ngo, and R. Wille, "The Chaos
Router Chip: Design and Implementation of an Adaptive Router", International Conference
on Very Large Scale Integration, Grenoble, France, September, 1993.
2.
S. Hassoun and D. Sanders, "Method and Apparatus for Parity Generation", U.S. Patent
5557622, Sept. 17, 1996.
3.
D. Dobberpuhl, R. Witek, R. Allmon, R. Anglin, D. Bertucci, S. Britton, L. Chao, R. Conrad, D.
Dever, B. Gieseke, S. Hassoun, G. Hoeppner, K. Kuchler, M. Ladd, B. Leary, L. Madden, E.
McLellan, D. Meyer, J. Montanaro, D. Priore, V. Rajagopalan, S. Samudrala, S. Santhanam. "A
200 MHz 64-b Dual-Issue CMOS Microprocessor", IEEE Journal of Solid-State Circuits,
November, 1992, Vol. 27, No. 11. Also appears in Digital Technical Journal. Vol. 4, No. 4,
1992.
4.
R. Allmon, B. Benschneider, M. Callander, L. Chao, D. Dever, J. Farrell, N. Fitzgerald, J.
Grodstein, S. Hassoun, L. Hudepohl, D. Kravitz, J. Lundberg, R. Marcello, S. Marino, J.
Pickholtz, R. Preston, M. Richesson, S. Samudrala, and D. Sanders. "System, Process, and
Design Implications of a Reduced Supply Voltage Microprocessor", IEEE International
Solid-State Circuits Conference (ISSCC), February, 1990.
5.
W. Dally, L. Chao, A. Chien, S. Hassoun, W. Horwat, J. Kaplan, P. Song, B. Totty, and S. Wills,
"Architecture of a Message-Driven Processor", International Symposium on Computer
Architecture (ISCA), June, 1987.
6.
S. Hassoun, "A Memory Design for the Message-Driven Processor", MIT VLSI Memo No.
88-564, August, 1988.
Soha Hassoun
7/8
R14. Future Research Plans.
Hassoun plans to continue research in the areas of synthesis, timing analysis, physical design, and to
investigate CAD tools for quantum computing.
In synthesis, Hassoun plans to continue to work on the synthesis of configurable systems. Efforts
have just commenced to understand and support how to use FPGAs as a configurable shared
resource among multiple system components. The FPGA will provide security measures and caching
mechanism to speed up run-time re-configurations. Hassoun also plans to investigate efficient
sequential optimization techniques that can produce results similar to those obtained by iterative
retiming and resynthesis.
In timing analysis, Hassoun is currently preparing a survey of over 100 papers on cross talk analysis
and preventive techniques. The survey provides insight into several new directions to further solve
this problem.
In physical design, significant progress has already been made to understand the need for routing
delay models that consider inductance. Upcoming work will examine limitations of earlier delay
models and argues the need to consider inductance for all buffered routing construction. Planned
research includes using wave pipelining, a technique to optimize sequential circuits, in optimal route
construction.
Hassoun’s long-term research focus will be CAD tools for quantum computing. Hassoun has already
established a quantum computing reading group at Tufts with Leon Gunther (Physics, Tufts), Bruce
Boghosian (Math, Tufts), and Mary Beth Rusaki (Math, Tufts), thus tackling this problem from
multi-disciplinary angles.
Soha Hassoun
8/8