Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
A CAD Framework for Leakage Power
Aware Synthesis of Asynchronous Circuits
Behnam Ghavami and Hossein Pedram
Presented by Wei-Lun Hung
Outline
Introduction
AsyncTool: Synthesis of QDI Asynchronous Circuits
Statistic Performance Analyzing
Transistor’s Parameters Assignment
Experimental Results
Conclusion
Introduction
The VLSI design challenges
High power consumption
Synchronization problems
Robust issues
One possible solution: Asynchronous circuit
Low power consumption
No clock skew
Low Electromagnetic Interference (EMI)
Asynchronous Circuits
Not controlled by global clock
Eliminate clock skew
Potentially faster
Low power consumption
Low EMI
Rely on exchanging handshaking
Limitations
Lack of automatic synthesis tool
Hard to evaluate performance of asynchronous circuits
Transistor’s Parameters
The Vth, Vdd and gate size are the parameters which affect
the performance of circuits
Heuristically search to find a good tradeoff according to the
optimization goal
The optimization of synchronous circuits
Multiple-Vth and multiple-Vdd assignment
Ex: the gates on critical paths operate at the higher Vdd or lower Vth
The optimization asynchronous circuits
Cannot compute a critical path as synchronous circuits
Depends on dynamic factors, ex: # of tokens
The Framework of Asynchronous Circuit
AsyncTool: Synthesis of QDI
Asynchronous Circuits
Asynchronous Circuit Model
Delay-insensitive (DI)
Most robust of all asynchronous circuit delay models
Makes no assumptions on the delay of wires or gates
Any transition on an input to a gate must be seen on the output
Not practical due to the heavy restrictions
Quasi delay-insensitive (QDI)
Like DI, but
Assume that the delay of the branch are equal (isochronic forks)
Use Verilog-CSP Code in this framework
AsyncTool: Synthesis of QDI
Asynchronous Circuits
Use Pre-Charge logic Full-Buffer (templates) for its
predefined templates
Encapsulate all isochronic forks inside
Eliminate isochronic fork constrain
3 Parts
Arithmetic function extractor (AFE)
Ex: Addition, subtraction, comparison ...
Implements them with pre-synthesized standard templates
Decomposition
Template Synthesizer (TSYN)
one-bit operators, ex: AND, OR, XOR, …
Expander is used to convert multiple-bit expressions
Decomposition (1/2)
Decompose the original description into an equivalent
collection of smaller interacting processes
Convert to dynamic single assignment form
Projection
Dynamic Single Assignment form
Decomposition (2/2)
Projection
Break the program up into a concurrent system of smaller
modules
Statistic Performance Analyzing
Petri-Nets
Used to model concurrency and synchronization
Represented as a bipartite graph
Defined as four-tuple N = (P, T, F, m0)
P: Set of places
T: Se qt of Transitions
F ⊆ (P × T) ∪ (T × P): Flow relation
m0: Initial marking
A Masking is a mapping M: P → N
Petri-Nets Examples
Timed Petri-Net
A Petri-Net in which transitions or places are annotated with
delays
For a cycle Ck, the cycle metric is
CM(Ck) = D(Ck)/M(Ck)
D(Ck) = ∑di, ∀i ∈ Ck
The performance of a Timed Petri-Net is dictated by the
cycle time largest cycle metric
CTime = MAX[CM(Ck)], ∀Ck∈ TPN
Can be resolved by Maximum Mean-Cycle Algorithms
Average Case VS Worst Case
Probabilistic Timed Petri-Net
The Average-Case Performance Metric
For a P-TPN has only one choice with n outcomes
Convert to n TPN models
For a P-TPN has more than one choice
Recursively the following formula
Probability Model
Use the static range of the primary inputs of the circuit to
determine the static range or internal signals
Independent VS dependent
Computing the Static Range (1/3)
The tagged static ranges of a variable v is shown by TSR(v),
where r ∈TSR(v) is expressed as <r.ct, r.vt, r.sr>
r.ct: the conditional tag
r.vt: the variable expression tag
r.sr: the static range
Computing the Static Range (2/3)
Having the static range of the right hand side variables can
compute the static range or left hand side variable by
Where ° is a standar operator on data values and • is operation on static ranges
Computing the Static Range (3/3)
For a loop
Computing Choice Probabilities(1/3)
For a condition variable CV(X>Y)
Computing Choice Probabilities(2/3)
Computing Choice Probabilities(3/3)
Template’s Parameters Assignment
The Vth, Vdd and gate size are the parameters which affect
the performance of circuits
Dual-Vdd, dual-Vth and eight sizes for each type of template
Adopt Quantum genetic algorithm
The Genetic Algorithm
A search technique used in computing to find exact or
approximate solutions to optimization
Use techniques inspired by evolutionary biology such as
inheritance, mutation, selection, and crossover
Population: abstract representations of candidate solutions
Repopulation: generate a second generation population of
solutions from those selected through genetic operators
Fitness function: decide the surviving chance of individuals
The Quantum Genetic Algorithm
The circuit configuration information is encoded into qubit
A qubit may be in ‘1’ or ‘0’ state, or in any superposition of the
two, represented as ⎜Ψ〉=α⎜1〉+β⎜0〉 , where ⎜α⎜2+
⎜β⎜2 = 1 , give the probability that the qubit will be found in
‘0’ or ‘1’
The Quantum Genetic Algorithm
The population of m qubit individals at generation g is
denoted as Q(g) = {q1g, q2g, …, qng} , where qj is defined as
The Update Procedure
The Quantum Genetic Algorithm
Fitness Function
Power
The leakage of a template depends on the number of transistors
that re turned off under inputs
Calculate the gate leakage under each input pattern
Area
A qubit have little chance to survival if its area is larger than the
area constraint
Performance
Control Parameters
Population size
For a small population, the genetic diversity may not increase
for many generations
For a large population, it may increase the computing time but
take fewer generation to find the best solutions
Small population of size 10 to 15 perform very well
Termination condition
The power reduction is less than 0.0005% during the last 200
generations
Performance Estimation Results
Power Optimization Results
Power Optimization Results
Different Technique Comparisons
Comparison to worst-case optimized
circuits
Conclusion
An efficient design framework for optimizing reducing total
power consumption while maintaining the high performance
of circuits
Use Probabilistic Timed Petri-Net model to capture the
dynamic behavior of the system
The proposed assigning threshold-voltage, supply-voltage and
template sizing method is based on a quantum genetic
algorithm
5X ~ 7X savings for power consumptions with 2.5%
performance penalty
Comments
Not Scalable?
Have to specify the static range of the inputs of the circuits
The connection between synthesis and parameter assigning is
not strong
Experimental results are questionable
Many typos