Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Evolutionary Heuristics for Multiobjective VLSI Netlist Bi-Partitioning by Dr. Sadiq M. Sait Dr. Aiman El-Maleh Mr. Raslan Al Abaji Computer Engineering Department 1 Outline …. • • • • • • Introduction Problem Formulation Cost Functions Proposed Approaches Experimental results Conclusion 2 VLSI Technology Trend Design Characteristics 0.06M 2MHz 6um SPICE Simulation 0.13M 12MHz 1.5um CAE Systems, Silicon compilation 1.2M 50MHz 0.8um HDLs, Synthesis 3.3M 200MHz 0.6um Top-Down Design, Emulation 7.5M 333MHz 0.25um Cycle-based simulation, Formal Verification Key CAD Capabilities The Challenges to sustain such an exponential growth to achieve gigascale integration have shifted in a large degree, from the process of 3 manufacturing technologies to the design technology. The VLSI Chip in 2006 Technology Transistors Logic gates Size Clock Chip I/O’s Wiring levels Voltage Power Supply current 0.1 um 200 M 40 M 520 mm2 2 - 3.5 GHz 4,000 7-8 0.9 - 1.2 160 Watts ~160 Amps Performance Power consumption Noise immunity Area Cost Time-to-market Tradeoffs!!! 4 VLSI Design Cycle •VLSI design process is carried out at a number of levels. 1. System Specification 2. Functional Design 3. Logic Design 4. Circuit Design 5. Physical Design 6. Design Verification 7. Fabrication 8. Packaging Testing and Debugging 5 Physical Design The physical design cycle consists of: 1. Partitioning 2. Floorplanning and Placement 3. Routing 4. Compaction Physical Design converts a circuit description into a geometric description. This description is used to manufacture a chip. 6 Why we need Partitioning ? • Decomposition of a complex system into smaller subsystems. • Each subsystem can be designed independently speeding up the design process (divide-and conquer-approach). • Decompose a complex IC into a number of functional blocks, each of them designed by one or a team of engineers. • Decomposition scheme has to minimize the interconnections between subsystems. 7 Levels of Partitioning System System Level Partitioning PCBs Board Level Partitioning Chip Level Partitioning Chips Subcircuits / Blocks 8 Motivation Need for Power optimization • Portable devices. • Power consumption is a hindrance in further integration. • Increasing clock frequency. Need for delay optimization • In current sub micron design wire delay tend to dominate gate delay. Larger die size imply long on-chip global routes, which affect performance. • Optimizing delay due to off-chip capacitance. 9 Objective • Design a class of iterative algorithms for VLSI multi objective partitioning. • Explore partitioning from a wider angle and consider circuit delay , power dissipation and interconnect in the same time, under balance constraint. 10 Problem formulation Objectives : • Power cost is optimized AND • Delay cost is optimized AND • Cutset cost is optimized Constraint • Balanced partitions to a certain tolerance degree. (10%) 11 Cutset • • • • Based on hypergraph model H = (V, E) Cost 1: c(e) = 1 if e spans more than 1 block Cutset = sum of hyperedge costs Efficient gain computation and update cutset = 3 12 Delay Model Metal 1 Metal 2 C2 C7 C3 SE1 C1 C4 C5 SE2 Cut Line C6 CoffChip path : SE1 C1C4C5SE2. Delay = CDSE1 + CDC1+ CDC4+ CDC5+ CDSE2 CDC1 = BDC1 + LFC1 * ( Coffchip + CINPC2+ CINPC3+ CINPC4) 13 Delay Delay(Pi) = Delay (cell ) Delay (net ) cellPi netPi Delay (cell ) Objective : Max Delay ( Pi) PiP Delay(Pi) = cellPi Pi: is any path Between 2 cells or nodes P : set of all paths of the circuit. 14 Power The average dynamic power consumed by CMOS logic gate in a synchronous circuit is given by: Pi average 2 dd V Load 0.5 Ci N i Tcycle Ni : is the number of output gate transition per cycle( switching Probability) Load i C : Is the Load Capacitance 15 Power Load i C C basic i C extra i basic : Load Capacitances driven by a cell i before Partitioning C C extra : additional Load due to off chip i capacitance.( cut net) Total Power dissipation of a Circuit: 2 dd V basic extra P Ci Ci N i Tcycle i 16 Power C C extra i C basic i extra : Can be assumed identical for all nets i objective: N i v iv :Set of Visible gates Driving a load outside the partition. 17 Balance The Balance as constraint is expressed as follows: cells / blocks Tol Block i cells / blocks Tol Tol cells / block * PercentTol However balance as a constraint is not appealing because it may prohibits lots of good moves. Objective : |Cells(block1) – Cells( block2)| 18 Fuzzy logic for cost function • Imprecise values of the objectives – best represented by linguistic terms that are basis of fuzzy algebra • Conflicting objectives • Operators for aggregating function 19 Use of fuzzy logic for Multiobjective cost function 1. The cost to membership mapping. 2. Linguistic fuzzy rule for combining the membership values in an aggregating function. 3. Translation of the linguistic rule in form of appropriate fuzzy operators. 20 Some fuzzy operators • And-like operators – Min operator = min (1, 2) – And-like OWA = * min (1, 2) + ½ (1- ) (1+ 2) Or-like operators – Max operator = max (1, 2) – Or-like OWA = * max (1, 2) + ½ (1- ) (1+ 2) Where is a constant in range [0,1] 21 Membership functions Where Oi and Ci are lower bound and actual cost of objective “i” i(x) is the membership of solution x in set “good ‘i’ ” gi is the relative acceptance limit for each objective. 22 Fuzzy linguistic rule • A good partitioning can be described by the following fuzzy rule IF solution has small cutset AND low power AND short delay AND good Balance. THEN it is a good solution 23 Fuzzy cost function The above rule is translated to AND-like OWA ( x) min C , P , D , B 1 1 C P D B 4 (x) Represent the total Fuzzy fitness of the solution, our aim is to Maximize this fitness. C , P , D , B Respectively (Cutset, Power, Delay , Balance ) Fitness. 24 GA for multiobjective Partitioning Algorithm (Genetic_Algorithm) Construct_Population(Np); For j = 1 to Np Evaluate_Fitness (Population[j]) End For; For i = 1 to Ng For j = 1 to No (x,y) Choose_parents; offspring[j] Crossover(x,y) EndFor; Population Select ( Population, offspring, Np ) For k = 1 to Np Apply Mutation (Population[k]) EndFor; EndFor; 25 Solution representation 26 GA implementation • Different population sizes • Parent selection: Roulette wheel – The probability of selecting a chromosome for crossover is Pch o ice μ(x) Np μ(i) i 1 Np is the population size 27 GA implementation •Simple single point crossover: •Selection before mutation •Roulette wheel (rlt) • Elitism random (ernd) 28 Tabu Search Algorithm Tabu_Search Start with an initial feasible solution S Initialize tabu list and aspiration level For fixed number of iterations Do Generate neighbor solutions V* N(S) Find best S* V* If move S to S* is not in T Then Accept move and update best solution Update T and AL Else If Cost(S*) < AL Then Accept move and update best solution Update T and AL End If End For 29 TS implementation • Neighbor solution – Change the block of a randomly selected cells. • The Tabu list size depends on the circuit size. 30 TS implementation Tabu list • Store index of one of the swapped cell. • Various sizes for tabu list. Aspiration Level • The best neighbor is better than the global best. 31 Experimental Results ISCAS 85-89 Benchmark Circuits 32 GA Vs Tabu Multi-objective 33 Circuit S13207 GA Circuit S13207 TS Circuit S13207 GA Vs TS time 36 Conclusion • The present work successfully addressed the important issue of reducing power and delay consumption in VLSI circuits. • The present work successfully formulate and provide solutions to the problem of multiobjective VLSI partitioning • TS partitioning algorithm outperformed GA in terms of quality of solution and execution time 37 Thank you. 38