Download 201002221513552

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Quantum computing wikipedia , lookup

Quantum teleportation wikipedia , lookup

Transcript
A CAD Framework for Leakage Power
Aware Synthesis of Asynchronous Circuits
Behnam Ghavami and Hossein Pedram
Presented by Wei-Lun Hung
Outline
 Introduction
 AsyncTool: Synthesis of QDI Asynchronous Circuits
 Statistic Performance Analyzing
 Transistor’s Parameters Assignment
 Experimental Results
 Conclusion
Introduction
 The VLSI design challenges
 High power consumption
 Synchronization problems
 Robust issues
 One possible solution: Asynchronous circuit
 Low power consumption
 No clock skew
 Low Electromagnetic Interference (EMI)
Asynchronous Circuits
 Not controlled by global clock
 Eliminate clock skew
 Potentially faster
 Low power consumption
 Low EMI
 Rely on exchanging handshaking
 Limitations
 Lack of automatic synthesis tool
 Hard to evaluate performance of asynchronous circuits
Transistor’s Parameters
 The Vth, Vdd and gate size are the parameters which affect
the performance of circuits
 Heuristically search to find a good tradeoff according to the
optimization goal
 The optimization of synchronous circuits
 Multiple-Vth and multiple-Vdd assignment
 Ex: the gates on critical paths operate at the higher Vdd or lower Vth
 The optimization asynchronous circuits
 Cannot compute a critical path as synchronous circuits
 Depends on dynamic factors, ex: # of tokens
The Framework of Asynchronous Circuit
AsyncTool: Synthesis of QDI
Asynchronous Circuits
Asynchronous Circuit Model
 Delay-insensitive (DI)
 Most robust of all asynchronous circuit delay models
 Makes no assumptions on the delay of wires or gates
 Any transition on an input to a gate must be seen on the output
 Not practical due to the heavy restrictions
 Quasi delay-insensitive (QDI)
 Like DI, but
 Assume that the delay of the branch are equal (isochronic forks)
 Use Verilog-CSP Code in this framework
AsyncTool: Synthesis of QDI
Asynchronous Circuits
 Use Pre-Charge logic Full-Buffer (templates) for its
predefined templates
 Encapsulate all isochronic forks inside
 Eliminate isochronic fork constrain
 3 Parts
 Arithmetic function extractor (AFE)
 Ex: Addition, subtraction, comparison ...
 Implements them with pre-synthesized standard templates
 Decomposition
 Template Synthesizer (TSYN)
 one-bit operators, ex: AND, OR, XOR, …
 Expander is used to convert multiple-bit expressions
Decomposition (1/2)
 Decompose the original description into an equivalent
collection of smaller interacting processes
 Convert to dynamic single assignment form
 Projection
 Dynamic Single Assignment form
Decomposition (2/2)
 Projection
 Break the program up into a concurrent system of smaller
modules
Statistic Performance Analyzing
Petri-Nets
 Used to model concurrency and synchronization
 Represented as a bipartite graph
 Defined as four-tuple N = (P, T, F, m0)
 P: Set of places
 T: Se qt of Transitions
 F ⊆ (P × T) ∪ (T × P): Flow relation
 m0: Initial marking
 A Masking is a mapping M: P → N
Petri-Nets Examples
Timed Petri-Net
 A Petri-Net in which transitions or places are annotated with
delays
 For a cycle Ck, the cycle metric is
 CM(Ck) = D(Ck)/M(Ck)
 D(Ck) = ∑di, ∀i ∈ Ck
 The performance of a Timed Petri-Net is dictated by the
cycle time  largest cycle metric
 CTime = MAX[CM(Ck)], ∀Ck∈ TPN
 Can be resolved by Maximum Mean-Cycle Algorithms
Average Case VS Worst Case
Probabilistic Timed Petri-Net
The Average-Case Performance Metric
 For a P-TPN has only one choice with n outcomes
 Convert to n TPN models
 For a P-TPN has more than one choice
 Recursively the following formula
Probability Model
 Use the static range of the primary inputs of the circuit to
determine the static range or internal signals
 Independent VS dependent
Computing the Static Range (1/3)
 The tagged static ranges of a variable v is shown by TSR(v),
where r ∈TSR(v) is expressed as <r.ct, r.vt, r.sr>
 r.ct: the conditional tag
 r.vt: the variable expression tag
 r.sr: the static range
Computing the Static Range (2/3)
 Having the static range of the right hand side variables can
compute the static range or left hand side variable by
Where ° is a standar operator on data values and • is operation on static ranges
Computing the Static Range (3/3)
 For a loop
Computing Choice Probabilities(1/3)
 For a condition variable CV(X>Y)
Computing Choice Probabilities(2/3)
Computing Choice Probabilities(3/3)
Template’s Parameters Assignment
 The Vth, Vdd and gate size are the parameters which affect
the performance of circuits
 Dual-Vdd, dual-Vth and eight sizes for each type of template
 Adopt Quantum genetic algorithm
The Genetic Algorithm
 A search technique used in computing to find exact or




approximate solutions to optimization
Use techniques inspired by evolutionary biology such as
inheritance, mutation, selection, and crossover
Population: abstract representations of candidate solutions
Repopulation: generate a second generation population of
solutions from those selected through genetic operators
Fitness function: decide the surviving chance of individuals
The Quantum Genetic Algorithm
 The circuit configuration information is encoded into qubit
 A qubit may be in ‘1’ or ‘0’ state, or in any superposition of the
two, represented as ⎜Ψ〉=α⎜1〉+β⎜0〉 , where ⎜α⎜2+
⎜β⎜2 = 1 , give the probability that the qubit will be found in
‘0’ or ‘1’
The Quantum Genetic Algorithm
 The population of m qubit individals at generation g is
denoted as Q(g) = {q1g, q2g, …, qng} , where qj is defined as
The Update Procedure
The Quantum Genetic Algorithm
Fitness Function
 Power
 The leakage of a template depends on the number of transistors
that re turned off under inputs
 Calculate the gate leakage under each input pattern
 Area
 A qubit have little chance to survival if its area is larger than the
area constraint
 Performance
Control Parameters
 Population size
 For a small population, the genetic diversity may not increase
for many generations
 For a large population, it may increase the computing time but
take fewer generation to find the best solutions
 Small population of size 10 to 15 perform very well
 Termination condition
 The power reduction is less than 0.0005% during the last 200
generations
Performance Estimation Results
Power Optimization Results
Power Optimization Results
Different Technique Comparisons
Comparison to worst-case optimized
circuits
Conclusion
 An efficient design framework for optimizing reducing total
power consumption while maintaining the high performance
of circuits
 Use Probabilistic Timed Petri-Net model to capture the
dynamic behavior of the system
 The proposed assigning threshold-voltage, supply-voltage and
template sizing method is based on a quantum genetic
algorithm
 5X ~ 7X savings for power consumptions with 2.5%
performance penalty
Comments
 Not Scalable?
 Have to specify the static range of the inputs of the circuits
 The connection between synthesis and parameter assigning is
not strong
 Experimental results are questionable
 Many typos