Download Physical Design

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Simplex algorithm wikipedia , lookup

Mathematical optimization wikipedia , lookup

Multiple-criteria decision analysis wikipedia , lookup

Genetic algorithm wikipedia , lookup

Lattice delay network wikipedia , lookup

Transcript
Evolutionary Heuristics for
Multiobjective
VLSI Netlist Bi-Partitioning
by
Dr. Sadiq M. Sait
Dr. Aiman El-Maleh
Mr. Raslan Al Abaji
Computer Engineering Department
1
Outline ….
•
•
•
•
•
•
Introduction
Problem Formulation
Cost Functions
Proposed Approaches
Experimental results
Conclusion
2
VLSI Technology Trend
Design Characteristics
0.06M
2MHz
6um
SPICE
Simulation
0.13M
12MHz
1.5um
CAE
Systems,
Silicon
compilation
1.2M
50MHz
0.8um
HDLs,
Synthesis
3.3M
200MHz
0.6um
Top-Down
Design,
Emulation
7.5M
333MHz
0.25um
Cycle-based
simulation,
Formal
Verification
Key CAD Capabilities
The Challenges to sustain such an exponential growth to achieve
gigascale integration have shifted in a large degree, from the process of
3
manufacturing technologies to the design technology.
The VLSI Chip in 2006
Technology
Transistors
Logic gates
Size
Clock
Chip I/O’s
Wiring levels
Voltage
Power
Supply current
0.1 um
200 M
40 M
520 mm2
2 - 3.5 GHz
4,000
7-8
0.9 - 1.2
160 Watts
~160 Amps
Performance
Power consumption
Noise immunity
Area
Cost
Time-to-market
Tradeoffs!!!
4
VLSI Design Cycle
•VLSI design process is carried out at a number of
levels.
1. System Specification
2. Functional Design
3. Logic Design
4. Circuit Design
5. Physical Design
6. Design Verification
7. Fabrication
8. Packaging Testing and Debugging
5
Physical Design
The physical design cycle consists of:
1. Partitioning
2. Floorplanning and Placement
3. Routing
4. Compaction
Physical Design converts a circuit description into
a geometric description. This description is used
to manufacture a chip.
6
Why we need Partitioning ?
• Decomposition of a complex system into
smaller subsystems.
• Each subsystem can be designed independently
speeding up the design process (divide-and
conquer-approach).
• Decompose a complex IC into a number of
functional blocks, each of them designed by one
or a team of engineers.
• Decomposition scheme has to minimize the
interconnections between subsystems.
7
Levels of Partitioning
System
System Level Partitioning
PCBs
Board Level Partitioning
Chip Level Partitioning
Chips
Subcircuits
/ Blocks
8
Motivation
Need for Power optimization
• Portable devices.
• Power consumption is a hindrance in further
integration.
• Increasing clock frequency.
Need for delay optimization
• In current sub micron design wire delay tend to
dominate gate delay. Larger die size imply long
on-chip global routes, which affect performance.
• Optimizing delay due to off-chip capacitance.
9
Objective
• Design a class of iterative algorithms for
VLSI multi objective partitioning.
• Explore partitioning from a wider angle and
consider circuit delay , power dissipation
and interconnect in the same time, under
balance constraint.
10
Problem formulation
Objectives :
• Power cost is optimized AND
• Delay cost is optimized AND
• Cutset cost is optimized
Constraint
• Balanced partitions to a certain tolerance degree. (10%)
11
Cutset
•
•
•
•
Based on hypergraph model H = (V, E)
Cost 1: c(e) = 1 if e spans more than 1 block
Cutset = sum of hyperedge costs
Efficient gain computation and update
cutset = 3
12
Delay Model
Metal 1
Metal 2
C2
C7
C3
SE1
C1
C4
C5
SE2
Cut Line
C6
CoffChip
path : SE1  C1C4C5SE2.
Delay = CDSE1 + CDC1+ CDC4+ CDC5+ CDSE2
CDC1 = BDC1 + LFC1 * ( Coffchip + CINPC2+ CINPC3+ CINPC4)
13
Delay
Delay(Pi) =
 Delay (cell )   Delay (net )
cellPi
netPi
 Delay (cell )
Objective : Max Delay ( Pi) 
PiP
Delay(Pi) =
cellPi
Pi: is any path Between 2 cells or nodes
P : set of all paths of the circuit.
14
Power
The average dynamic power consumed by CMOS
logic gate in a synchronous circuit is given by:
Pi
average
2
dd
V
Load
 0.5 
 Ci  N i
Tcycle
Ni : is the number of output gate transition per cycle( switching
Probability)
Load
i
C
: Is the Load Capacitance
15
Power
Load
i
C
C
basic
i
C
extra
i
basic : Load Capacitances driven by a cell
i
before Partitioning
C
C
extra : additional Load due to off chip
i
capacitance.( cut net)
Total Power dissipation of a Circuit:
2
dd


V
basic
extra
P   
  Ci  Ci  N i
Tcycle i
16
Power
C
C
extra
i
 C
basic
i
extra
: Can be assumed identical for all nets
i
objective:  N i
v
iv
:Set of Visible gates Driving a load outside the
partition.
17
Balance
The Balance as constraint is expressed as follows:
cells / blocks  Tol  Block i  cells / blocks  Tol
Tol  cells / block * PercentTol
However balance as a constraint is not appealing because it may
prohibits lots of good moves.
Objective : |Cells(block1) – Cells( block2)|
18
Fuzzy logic for cost function
• Imprecise values of the objectives
– best represented by linguistic terms that are
basis of fuzzy algebra
• Conflicting objectives
• Operators for aggregating function
19
Use of fuzzy logic for Multiobjective cost function
1. The cost to membership mapping.
2. Linguistic fuzzy rule for combining the
membership values in an aggregating
function.
3. Translation of the linguistic rule in form of
appropriate fuzzy operators.
20
Some fuzzy operators
• And-like operators
– Min operator  = min (1, 2)
– And-like OWA
 =  * min (1, 2) + ½ (1- ) (1+ 2)
Or-like operators
– Max operator  = max (1, 2)
– Or-like OWA
 =  * max (1, 2) + ½ (1- ) (1+ 2)
Where  is a constant in range [0,1]
21
Membership functions
Where Oi and Ci are lower bound and actual cost of objective “i”
 i(x) is the membership of solution x in set “good ‘i’ ”
gi is the relative acceptance limit for each objective.
22
Fuzzy linguistic rule
• A good partitioning can be described by the
following fuzzy rule
IF solution has
small cutset AND
low power AND
short delay AND
good Balance.
THEN it is a good solution
23
Fuzzy cost function
The above rule is translated to AND-like OWA
 ( x)    min  C ,  P ,  D ,  B 
1
1     C   P   D   B 
4
 (x)
Represent the total Fuzzy fitness of the solution,
our aim is to Maximize this fitness.
C ,  P ,  D ,  B
Respectively (Cutset, Power, Delay ,
Balance ) Fitness.
24
GA for multiobjective Partitioning
Algorithm (Genetic_Algorithm)
Construct_Population(Np);
For j = 1 to Np
Evaluate_Fitness (Population[j])
End For;
For i = 1 to Ng
For j = 1 to No
(x,y)  Choose_parents;
offspring[j]  Crossover(x,y)
EndFor;
Population  Select ( Population, offspring, Np )
For k = 1 to Np
Apply Mutation (Population[k])
EndFor;
EndFor;
25
Solution representation
26
GA implementation
• Different population sizes
• Parent selection: Roulette wheel
– The probability of selecting a chromosome for
crossover is
Pch o ice 
μ(x)
Np
 μ(i)
i 1
Np is the population size
27
GA implementation
•Simple single point
crossover:
•Selection before mutation
•Roulette wheel (rlt)
• Elitism random (ernd)
28
Tabu Search
Algorithm Tabu_Search
Start with an initial feasible solution S  
Initialize tabu list and aspiration level
For fixed number of iterations Do
Generate neighbor solutions V*  N(S)
Find best S*  V*
If move S to S* is not in T Then
Accept move and update best solution
Update T and AL
Else If Cost(S*) < AL Then
Accept move and update best solution
Update T and AL
End If
End For
29
TS implementation
• Neighbor solution
– Change the block of a randomly selected cells.
• The Tabu list size depends on the circuit
size.
30
TS implementation
Tabu list
• Store index of one of the swapped cell.
• Various sizes for tabu list.
Aspiration Level
• The best neighbor is better than the global
best.
31
Experimental Results
ISCAS 85-89 Benchmark Circuits
32
GA Vs Tabu Multi-objective
33
Circuit S13207 GA
Circuit S13207 TS
Circuit S13207 GA Vs TS time
36
Conclusion
• The present work successfully addressed the
important issue of reducing power and delay
consumption in VLSI circuits.
• The present work successfully formulate and
provide solutions to the problem of multiobjective
VLSI partitioning
• TS partitioning algorithm outperformed GA in
terms of quality of solution and execution time
37
Thank you.
38